LhasaLimited The #shef2019 Eighth Joint Sheffield Conference on Chemoinformatics takes place next week. Will we see you there at our presentations? https://t.co/c2SpZcJgqe https://t.co/JljjKyIid5

    --> WendyAnneWarr @LhasaLimited Sincerely hope so.

CDDVault [JUNE 17-19] Stop by our exhibit table and have a chat with Renate Baker and Susana Tomasio from CDD about your discovery informatics workflows.


#CDD #ElectronicLabNotebook #DrugDiscovery #ELN #MedicinalChemistry #Biologics #shef2019 https://t.co/meG2MEAXF9

Optibrium The Optibrium team will be attending the 8th Sheffield Conference on Chemoinformatics at The Edge, and presenting the poster ‘N- and S-Oxidation Model of the Flavin-containing Monooxygenases’. https://t.co/xp5MGCHFgV #shef2019 #drugdiscovery https://t.co/Nzcyinp52z

MedChemica Packing our poster for the 8th joint Sheffield Conference of Chemoinformatics, Showing tox predictions in a way chemists can understand and audit quickly. #ExplainableAI #shef2019 https://t.co/kXteXZDV0g #Cheminformatics #Conference https://t.co/B7zHPdCYQ4

CDDVault [JUNE 17-19] Stop by our exhibit table and have a chat with Renate Baker and Susana Tomasio from CDD about your discovery informatics workflows.


#CDD #ElectronicLabNotebook #DrugDiscovery #ELN #MedicinalChemistry #Biologics #shef2019 https://t.co/5pAIj933Vt

cthoytp News from the @adlvdl on how to start a fight at #shef2019: ask if it is Cheminformatics or chemoinformatics https://t.co/mcm8CtxXyL

    --> adlvdl @cthoytp The best way to make friends during the conference

peter_ertl Tomorrow I am leaving for the #shef2019 #cheminformatics conference where I will be presenting talk - "Encyclopaedia of functional groups" https://t.co/lZzRWSoDAs . This will be already my 7th "Sheffield" and I am looking very much to it!
    --> nathanbroon @peter_ertl You forgot the Sheffield/Erlangen ‘o’! See you tomorrow! #Chemoinformatics 😉
        --> Chris_de_Graaf @nathanbroon @peter_ertl Don’t forget the Obernai ‘o’ ;-)!

LISResearch RT @adlvdl: The conference programme for the Eighth Joint Sheffield Conference on Chemoinformatics #shef2019 is available at https://t.co/JvWNmgjW8h. A lot of great topics and interesting science. See you all in Sheffield next month! #conference #chemoinformatics #cheminformatics

BZdrazil Heading to Manchester for my first (can’t believe this) Sheffield #cheminformatics conference. Looking forward to talk about how we are tracking target innovation in drug discovery providing both a chemical and biological perspective @nathanbroon @rguha #shef2019
    --> nathanbroon @BZdrazil @rguha Just leaving the house in London too. See you soon!
        --> BZdrazil @nathanbroon @rguha Temperatur difference will be shocking (from 33 going down to 13 degrees Celsius)... see you in the cold!
            --> pwk2013 @BZdrazil @nathanbroon @rguha Have a great conference and make sure to deal with any mention of #PAINS filters faster than you can say #SingletOxygen
        --> GJPvWesten @nathanbroon @BZdrazil @rguha Just landed... seeing 5 737 max's unused strange site.... https://t.co/oSPpAChxkc

            --> BZdrazil @GJPvWesten @nathanbroon @rguha Just landed too... https://t.co/wgp6MBBHtM

                --> rguha @BZdrazil @GJPvWesten @nathanbroon I am quite jealous :(
                    --> BZdrazil @rguha @GJPvWesten @nathanbroon It‘s a pity you can not join! We‘ll have an extra 🍺 for you tonite.
    --> nathanbroon @BZdrazil @rguha Just realised this is my 8th conference out of nine they’ve run. The first one was before I started.
        --> nathanbroon @BZdrazil @rguha Scratch that, seventh out of eight. My first was in 2001 when I was still a student in Sheffield.
            --> rguha @nathanbroon @BZdrazil https://t.co/YrzWHGpUwP

                --> nathanbroon @rguha @BZdrazil I don’t know what this means…
                    --> BZdrazil @nathanbroon @rguha Lonely soldier?
                    --> rguha @nathanbroon @BZdrazil Ha. It needs the other frames https://t.co/B5ySoxyzfb
    --> curephile @BZdrazil Just curious, what is "tracking target innovation"? Is it identifying new druggable targets?
        --> BZdrazil @curephile Target innovation includes the discovery of new drug targets as well as the exploitation of existing ones....

GJPvWesten Leiden on it's way to #shef2019 Proper British taxi experience..  @LenselinkBart @NymphBrandon @OlivierBeqgn @CDDLeiden https://t.co/F8kpHpCyc9

    --> cthoytp @GJPvWesten @LenselinkBart @NymphBrandon @OlivierBeqgn @CDDLeiden See you all soon!

nathanbroon On the train to Sheffield for the Eighth Joint Conference on #Chemoinformatics, my seventh. In representing @RSC_CICAG here and will be takingover their twitter for the next few days. #Shef2019 #Cheminformatics #CompChem https://t.co/tW052QCIkz

    --> ChemicallyLego @nathanbroon @RSC_CICAG I see #TrainGin
        --> nathanbroon @ChemicallyLego @RSC_CICAG You sound surprised!
            --> ChemicallyLego @nathanbroon @RSC_CICAG Not in any way 😋
    --> pschmidtke @nathanbroon @RSC_CICAG Product placement @marksandspencer ;) have fun and tweet the interesting stuff plz!
    --> SthLondonNick @nathanbroon @RSC_CICAG Not sure if that's a happy face or not...
    --> MedChemica @nathanbroon @RSC_CICAG See you there, looking forward to a beer and catch up
    --> matthew__ashton @nathanbroon @RSC_CICAG Good luck Dr Brown!
    --> ChemConnector @nathanbroon @RSC_CICAG I didn’t make it to any of them when working for RSC and less likely now working for EPA
        --> nathanbroon @ChemConnector @RSC_CICAG That’s a shame. Always a good conference. I’m not employed by RSC, just representing one of the SIGs. I think some RSC employees have attended before.
            --> ChemConnector @nathanbroon @RSC_CICAG I know where you work Nathan 😁 I was just commenting that I spent so much time in the U.K. over five years with RSC I should’ve made it ...
                --> nathanbroon @ChemConnector @RSC_CICAG It’s a shame. Always a good conference. However, I studied under Peter Willett so I’m a little biased.
    --> DrAnthonyMeijer @nathanbroon @RSC_CICAG Welcome back!

nathanbroon Who have I missed? @DrJoshuaBox @RichardSherhod @DrBobClark1 @peter_ertl @marwinsegler @GJPvWesten @BZdrazil @adlvdl @WendyAnneWarr @CKannas @al_dossetter @hjuinj @LeeSteinberg @jwmay @JohnDelaney8 @gmm @LenselinkBart @NymphBrandon @OlivierBeqgn @Chris_de_Graaf #Shef2019
    --> GJPvWesten @nathanbroon @DrJoshuaBox @RichardSherhod @DrBobClark1 @peter_ertl @marwinsegler @BZdrazil @adlvdl @WendyAnneWarr @CKannas @al_dossetter @hjuinj @LeeSteinberg @jwmay @JohnDelaney8 @gmm @LenselinkBart @NymphBrandon @OlivierBeqgn @Chris_de_Graaf Missed you too! ;)
        --> nathanbroon @GJPvWesten @DrJoshuaBox @RichardSherhod @DrBobClark1 @peter_ertl @marwinsegler @BZdrazil @adlvdl @WendyAnneWarr @CKannas @al_dossetter @hjuinj @LeeSteinberg @jwmay @JohnDelaney8 @gmm @LenselinkBart @NymphBrandon @OlivierBeqgn @Chris_de_Graaf https://t.co/PZOUO4ql2R

    --> mostafabenh @nathanbroon @DrJoshuaBox @RichardSherhod @DrBobClark1 @peter_ertl @marwinsegler @GJPvWesten @BZdrazil @adlvdl @WendyAnneWarr @CKannas @al_dossetter @hjuinj @LeeSteinberg @jwmay @JohnDelaney8 @gmm @LenselinkBart @NymphBrandon @OlivierBeqgn @Chris_de_Graaf In case you missed it 

GJPvWesten A great @CDDLeiden presence at #Shef2019 
-Liu: Reinforcement learning for molecular generation (talk)
-Burggraaff: Kinase polypharmacology modeling (talk)
-Bongers: Machine learning models of solute carriers (poster)
-Bequignon: Machine learning for DILI prediction (poster)
    --> pschmidtke @GJPvWesten @CDDLeiden ANything to read for the #kinase #polypharmacology modeling part already @GJPvWesten
        --> CDDLeiden @pschmidtke @GJPvWesten Apologies, not yet. We are waiting for the dream consortium.

GJPvWesten We're at the York! @CDDLeiden #shef2019

egonwillighagen #shef2019 is now in @wikidata https://t.co/aQUh9W1nch (well, part of it) #Scholia
    --> egonwillighagen @wikidata if you want to add more, go here: https://t.co/FZ1PhzaNj5
        --> egonwillighagen @wikidata (for example, to list yourself as participant ;)

cthoytp Pre #shef2019 dinner at the York https://t.co/4d9r4E6uBA

    --> cdsouthan @cthoytp Would have been great to have have been at #shef2019 but I am at #Elixir19 so mustn't grumble

LeeSteinberg Hanging out with @robertshaw383 and @ErenSlate before #shef2019 - making sure my PowerPoint animations are perfect before my talk tomorrow!
    --> nathanbroon @LeeSteinberg @robertshaw383 @ErenSlate Best of luck!

WillPitt1 On my way to #Shef2019. Ask me about this job if you are interested. 

RSC_CICAG The @RSC_CICAG are proud sponsors of the Eighth Joint Sheffield Conference on Chemoinformatics #Shef2019 #Chemoinformatics #Cheminformatics #CompChem #RealTimeChem https://t.co/E2SFLszfNJ

Optibrium Hear Tom Whitehead,https://t.co/PxFLwAo8pi at the #shef2019 8th Sheffield Conference on Chemoinformatics presenting ‘Imputation of assay bioactivity data using deep learning’. https://t.co/o5uxLVfSXv  #drugdiscovery #artificialintelligence https://t.co/kxqQ8tU7co

RSC_CICAG Conference is open at the Eighth Joint Sheffield Conference on Chemoinformatics #Shef2019 #Chemoinformatics #Cheminformatics #CompChem

baoilleach #shef2019 Sheffield Conf on Chemoinformatics....my tweets begin now..
    --> ozkirimli_elif @baoilleach Thank you so much for the detailed tweets!!

baoilleach #shef2019 Val Gillett shows overview of attendees to last eight conferences. Maxed out in 2007, declined to 2016, back up a little bit in 2019. 21st birthday.
    --> baoilleach #shef2019 Tom Whitehead of Intellegens on Imputation of assay bioactivity data using deep learning (collab with Optibrium)
        --> baoilleach #shef2019 We're a startup, but we've got some proven applications in materials discovery (super alloys), patient analytics, but today drug discovery.
            --> baoilleach #shef2019 Data is usually spare or missing, and so work around data of this type - Alchemite is the name of the method. Shows example of Novartis dataset for benchmarking machine learning (Martin et al 2017). Remove 95% of data and try to impute it.
                --> baoilleach #shef2019 The Novartis dataset is interesting because you are trying to extrapolate from known data to unknown, rather than randomly dividing dataset into training+test, the test data is 'outside' the training.
                    --> baoilleach #shef2019 Compared against RF model built on StarDrop descriptors to predict pIC50. RF not v good at extrapolation: R2 is negative.
                        --> baoilleach #shef2019 Deep learning can do better as it uses information from multiple assays at the same time, and gets R2 of 0.46. What we want to know is how confident is a prediction.
                            --> baoilleach #shef2019 Can plot % missing data imputed versus RMSE to identify threshold for accuracy. An anon big pharma company has given access to 0.75M of their bioactivity data to make predictions. R2 of 0.50.

RSC_CICAG This is the 21st Anniversary of the Sheffield Conference on Chemoinformatics. The first conference took place in 1998. #Shef2019

dr_greg_landrum And here we go...

RSC_CICAG New for this year is the Peter Willett Award for Outstanding Poster Presentation in recognition of Peter's outstanding career. @RSC_CICAG is delighted to be sponsoring this new award. #Shef2019

DrJoshuaBox Sheffield Chemoinformatics Conference this week, its celebrating its 21st birthday! #Shef2019 https://t.co/Rn7RZPK04n

RSC_CICAG First speaker of the day is Tom Whitehead from @intellegensai speaking on Imputation of assay bioactivity data using deep learning. Paper is here from @JCIM_ACS https://t.co/UPsyyoyyXq #Shef2019 #CompChem
    --> RSC_CICAG This work is a collaboration between @intellegensai and @Optibrium #Shef2019
        --> RSC_CICAG .@intellegensai deep learning platform is called Alchemite. #Shef2019
            --> RSC_CICAG This work used the Profile-QSAR benchmarking set from Novartis: Profile-QSAR 2.0: Kinase Virtual Screening Accuracy Comparable to Four-Concentration IC50s for Realistically Novel Compounds https://t.co/vaVnwz7xZz #Shef2019
                --> RSC_CICAG The Profile-QSAR held-out test set is a realistic representation of outliers in which we are interested for the next iteration of the design-make-test-analyse cycle. #Shef2019
                    --> RSC_CICAG The aim of this work is to be able to accurately impute predictions from existing data and uses random forest regression. #Shef2019
                        --> RSC_CICAG Predicting pretty well compared to others who have tried. Now on to model confidence in predictions. #Shef2019
                            --> RSC_CICAG Filling in missing data to increase accuracy and increase confidence. Allows you to ahead of time define acceptance criteria for predictions in a project. #Shef2019
                                --> RSC_CICAG Hot off the press: access to real project data from an unnamed Pharma company of 710,305 compounds across 2,171 assay and 3,568 endpoints #Shef2019
                                    --> RSC_CICAG Summary: train across all endpoints simultaneously capturing activity-activity correlations using sparse data as input. #Shef2019
                                        --> RSC_CICAG Tom's abstract is here: https://t.co/wcfOxijKbO #Shef2019

DrJoshuaBox "Imputation of Assay Bioactivity Data Using Deep Learning" https://t.co/zfZKxtGr9d #shef2019

LhasaLimited We're at the #shef2019 Eighth Joint Sheffield Conference on Chemoinformatics today week. Look out for our presentations by Senior Scientist Sebastien and Research Leader Thierry: https://t.co/c2SpZcJgqe https://t.co/JAVwMovIng

WendyAnneWarr #shef2019 I AM here folks but it took me 10 mins to get WiFi working. I am familiar with the Intellegens @Optibrium work. Heard good talk at Optibrium consultants day and there is stuff on the Optibrium website https://t.co/NyfmhfqccY

WendyAnneWarr #shef2019 for intellegens @Optibrium work see also @JCIM 2019 59 1197

WendyAnneWarr #shef2019 a big pharma has now given Intellegens 750K compounds and data to test Alchemite https://t.co/3VrfBF4pVj preliminary results R-squared 0.5

RSC_CICAG Please tweet about the conference using #Shef2019: @DrJoshuaBox @RichardSherhod @DrBobClark1 @peter_ertl @marwinsegler @GJPvWesten @BZdrazil @adlvdl @WendyAnneWarr @CKannas @al_dossetter @hjuinj @LeeSteinberg @jwmay @JohnDelaney8 @gmm @LenselinkBart @NymphBrandon
    --> RSC_CICAG #Shef2019 @OlivierBeqgn @Chris_de_Graaf @griffen_ed @WillPitt1 @dr_greg_landrum @cthoytp @OleinikovasV @webbres @baoilleach
        --> RSC_CICAG #Shef2019 @MedChemica @CDDVault @cressetgroup @Optibrium @LhasaLimited @dotmatics @CDDLeiden @nmsoftware @MgmsUpdates @RSC_CICAG @Schrodinger @OpenEyeSoftware @knime

WendyAnneWarr #shef2019 wet start to https://t.co/bQN5FC7RYk. @SciFinder https://t.co/VClUMxYYJB

RSC_CICAG Great to have you here, Wendy! #Shef2019 https://t.co/4kaBcYHoko

RSC_CICAG Next up is Stephen Pickett from GlaxoSmithKline talking on Validating automated design and active learning. Abstract here: https://t.co/kHXLPlX6f3 #Shef2019
    --> RSC_CICAG Molecular Design at GSK uses predictive modelling, MPOs, hit finding, understanding the protein, mechanisms, informatics, and deploying a platform for analysis. #Shef2019
        --> RSC_CICAG QSAR Workbench from GSK: https://t.co/30p0K23gN1 #Shef2019
            --> RSC_CICAG GSK uses Live Design from @Schrodinger and its ability to define multi-parameter objective profilfes. #Shef2019
                --> RSC_CICAG GSK also uses Matched Molecular Pair Analysis and published an efficient algorithm: https://t.co/utxKzSpD9u #Shef2019
                    --> RSC_CICAG GSK is also using Free Energy Perturbation and SOA Interaction Energy Prediction. #Shef2019
                        --> RSC_CICAG Pickett: "Application relies on intuition, patchy utilisation, non-experts in the tools and algorithms, evaluating rather than generating ideas." #Shef2019
                            --> RSC_CICAG "Systematic application requires a platform and going from 1,000s of molecules down to 100s, many iterations to few, from four years down to one." #Shef2019
                                --> RSC_CICAG GSK's platform is called Bradshaw, after John Bradshaw a pioneering Chemoinformatics scientist. #Shef2019
                                    --> RSC_CICAG Pickett paper: "De Novo Molecule Design by Translating from Reduced Graphs to SMILES"
https://t.co/Mroy1BZEhI #Shef2019
                                        --> RSC_CICAG GSK using BRICS for molecular fragmentation: On the Art of Compiling and Using 'Drug‐Like' Chemical Fragment Spaces
 https://t.co/VvG5DXvUKp #Shef2019
                                            --> RSC_CICAG BRICS builds (pardon the pun) on RECAP from GSK: RECAP Retrosynthetic Combinatorial Analysis Procedure:  A Powerful New Technique for Identifying Privileged Molecular Fragments with Useful Applications in Combinatorial Chemistry https://t.co/Aq2tszeDQO #Shef2019
                                                --> RSC_CICAG BioDig as a molecular generator using fragmentation rules, searching those fragments, look for transforms that give the right profile change you are looking for and apply that transform to your original molecule. #Shef2019
                                                    --> RSC_CICAG Pickett compares their work with this from @marwinsegler: Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks
https://t.co/eBkhFOHjPU #Shef2019
                                                        --> RSC_CICAG Pickett and GSK used RNNNs to understand how much data you need in collaboration with @SAmabilino #Shef2019
                                                            --> RSC_CICAG GSK wrote a seq-to-seq algorithm to go from reduced graph SMILES back to possible molecular structures (typically one-to-many relationship) #Shef2019
                                                                --> RSC_CICAG Systematic Ideation: evaluation of three molecular evolution programs: BRICS, BioDig, and RG2SMI. #Shef2019
                                                                    --> RSC_CICAG Jacob Bush will be presenting on this work: A Turing test for molecular generators at our Artificial Intelligence in Chemistry meeting. https://t.co/iCplPrGhsU #AIChem19 #Shef2019 #CompChem #ArtificialIntelligence #MachineLearning #AI #ML #Chemoinformatics #Cheminformatics
                                                                        --> RSC_CICAG Active Learning. Typical models are used exploitatively. Better to explore the space where our predictions are low in confidence. #Shef2019
                                                                            --> RSC_CICAG Pilot screen of 40k, single-shot of 5-10k from HTS library, full curves for 5k, specific endpoint models. Cf. iterative screening around fifteen years ago. #Shef2019
                                                                                --> RSC_CICAG When talking about exploration and exploitation it is always worth highlighting this paper from James March: Exploration and Exploitation in Organizational Learning: https://t.co/NIhmYrpXVK #Shef2019
                                                                                    --> RSC_CICAG GSK is continuing the evaluation of the algorithms and methods in BRADSHAW and using them in anger. #Shef2019
                                                                                        --> RSC_CICAG Question asked if the work focusses too much on mimicking the chemist design process with the assumption this is the oracle of truth. #Shef2019
                    --> WendyAnneWarr @RSC_CICAG #shef2019 @JCIM_ACS

WendyAnneWarr #shef2019 next up Stephen Pickett of GSK on validating automated design and active learning. GSK has automated design environment called BRADSHAW. Trad approaches (MMP and BRICS) compared with generative algorithms from deep learning neural networks

baoilleach #Shef2019 Stephen Pickett (GSK) on validating automated design and active learning
    --> baoilleach #shef2019 GSK have been building a common platform for automated molecular design. Automating the traditional "drug make test" cycle: screen/analysis/what to make next/make it
        --> baoilleach #shef2019 Have automated machine learning model creation: "QSAR workbench". (Cox et al JCAMD 2013). More than 60 global ML models for ADMET props, published to LiveDesign, etc. Some models have replaced screening assays.
            --> baoilleach #shef2019 BioDig - Automated SAR extraction. Ref to Hussain/Rea (2010). Creates transform rule for various transformations, e.g. how it improves clearance. Can dig down into nbr context at various levels.
                --> baoilleach #Shef2019 Free E Perturbation - SOA interaction E prediction. Not yet an automated platform but trying to integrate into workflows.
                    --> baoilleach #Shef2019 Great science but...application relies on "intuition", patchy utilisation, non-experts should be able to use them, should be evaluating rather than generating ideas
                        --> baoilleach #Shef2019 Systematic application requires a platform. "BRADSHAW" (!) is GSK's automated mol des platform. Developed with Tessella. UI that integrates with LiveDesign. Webservices. Using PipelinePilot to integrate our own webservices into it.
                            --> baoilleach #Shef2019 Not a workflow tool. Just a common interface that embeds "best practice". Shows example of task configuration where the user chooses in the molecule generator "I want an oral drug". Can access more options if neccessary.
                                --> baoilleach #Shef2019 GSK molecular generators. Reaction-based "BRICS"; knowledge based (BioDig and Fit&Predict); and Deep learning "RG2sMI". BRICS (Degen et al 2008) is based on combining RECAP fragments back together.
                                    --> baoilleach #Shef2019 GSK BRICS does fragment replacement based on what chemists have made. Molecules are fragmented and attachment points labelled with isotopes [ed: missed the meaning of specific isotopes]. Replacements done by replacing with fragments with the same labels.
                                        --> baoilleach #Shef2019 Deep learning. How many molecules would we need for transfer learning? A PhD student (Silvia Amabilino, Uni Bristol) did some experiments on this. Similarity to input data should improve as the algorithm learns. Works fine with larger dataset; not so much with smaller.
                                            --> baoilleach #Shef2019 Built Seq2Seq algorithm to go from reduced graph back to SMILES. Because of stochastic nature, can generate lots of molecules within the constraints of the reduced graph. (Pogany JCIM 2019)
                                                --> baoilleach #Shef2019 Are we generating the right molecules? Jacob Bush (GSK) compared the three methods versus his own ideas. Took this forward by asking several med chemists to come up with ideas - can the methods reproduce. BioDIG is the winner by a long shot (gets 90% of the componds).
                                                    --> baoilleach #Shef2019 Brics okay, but DNN poor. Perhaps not surprising given the nature of the problem. But were the chemists right, and the algorithms wrong. Assessing the machine results: can a chemist distinguish between the human and machine generated ones? Like/dislike.
                                                        --> baoilleach #Shef2019 Active learning. Exploit the model vs create novelty vs high confidence in prediction. Model is updated in every iteration. Example of application to complex phenotypic assay.
                                                            --> baoilleach #Shef2019 Question from Val about generating molecules that are non-obvious. Answer: the first step is to generate molecules that they expect.
                                        --> dr_greg_landrum @baoilleach I liked the approach for using BRICS to suggest related molecules to a starting point instead of the usual "de novo" approach used in the @RDKit_org

WendyAnneWarr #shef2019. Pickett on automating the traditional design make test cycle. Cox et al. JCAMD 2013 27 321 covers the QSAR Workbench

WendyAnneWarr #shef2019 GSK wants t o reduce number of compounds and iterations and reduce 4 years to one

WendyAnneWarr #shef2019 BRADSHAW built in collaboration with Tessella. Pipeline Pilot for web services

WendyAnneWarr #shef2019 BRADSHAW has embedded best practice

WendyAnneWarr #shef2019 Pogany Pickett et al. 2019 69 1237 @JCIM_ACS

adlvdl Thank you to @baoilleach @RSC_CICAG @WendyAnneWarr and others live tweeting the conference #shef2019

WendyAnneWarr #shef2019 Pickett outlines RECAP and BRICS. Published work. @JCIM_ACS

WendyAnneWarr #shef2019  Next GSK alg to replace fragmentswith attachments from GSK space. BioDig for improved clearance. search  BioDig for transforms. Segler  et al in ACS Central Science cited re neural network @ACSCentSci

WendyAnneWarr #shef2019 to train RNN on ChEMBL needs large dataset. Pickett useD  deEp learning to do reduced graph to SMILES . Published 2019 (see earlier tweet)

WendyAnneWarr #shef2019 Jacob Bush of GSK evaluated 3 molecular evolution programmes. Ideas from 13 medchem leaders obtained. BioDig did really well versus medchemists. RG2SMI not quite so good.

WendyAnneWarr #shef2019 GSK machine generated ideas indistinguishable from chemists’ ones

cthoytp Presenters at #shef2019 please prepare your presentations for the low contrast/small screen

WendyAnneWarr #shef2019 Pickett outlines active learning on typical phenotypic assay. Results over time and quality of alg diagrams shown.

WendyAnneWarr #shef2019 GSK now testing this on real life projects

WendyAnneWarr #shef2019 besides Tessella, Pickett also acknowledges @3dsBIOVIA RDKit @RDKit_org @chemaxon and @Schrodinger Live Design

RSC_CICAG Last speaker of the morning session is Xuhan Liu from Leiden University: Drug molecule de novo design by multi-objective reinforcement learning for polypharmacology. Abstract here: https://t.co/796J3C8CCL #Shef2019
    --> RSC_CICAG This work is published here in @jcheminf: An exploration strategy improves the diversity of de novo ligands using deep reinforcement learning: a case for the adenosine A2A receptor https://t.co/hrPNzXdx06 #Shef2019
        --> RSC_CICAG Properties desired for molecules: high affinity for target of interest, low affinity for off-targets, large diversity, and to some extent similar to known ligands. #Shef2019
            --> RSC_CICAG Dataset converted from structures to SMILES representation for the RNN generator and generate a vocabulary from natural language processing #Shef2019
                --> RSC_CICAG Dataset is from ZINC -2<logP<6, 200<MW<600, one mio molecules uniformaly sampled: ZINC 15 – Ligand Discovery for Everyone https://t.co/PThuoLGSJQ #Shef2019 @chem4biology
                    --> RSC_CICAG Able to generate molecules using the trained RNN model are able to cover chemistry space desired. #Shef2019
                        --> RSC_CICAG Reinforcement learning is the interplay between an agent and its environment. Actions of agent on environment, giving a reward and updated state. #Shef2019
                            --> RSC_CICAG Loss function is from Williams in 1992: Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning https://t.co/7ZwR5M192H #Shef2019
                                --> RSC_CICAG Exploration strategy borrows ideas from evolutionary algorithms using crossover and mutation. #Shef2019
                                    --> RSC_CICAG Code available here for exploratory design of molecules for multiple targets: An exploration strategy improves the diversity of de novo drug design using deep reinforcement learning https://t.co/g7vdShVemv #Shef2019
                                        --> RSC_CICAG Planning test this process prospectively by synthesising and testing designed compounds. #Shef2019
        --> cthoytp @RSC_CICAG @jcheminf Would love to see links to code that supported the analysis as well

baoilleach #Shef2019 Xuhan Liu (Leiden) on "Updated DrugEx: Drug molecule de novo design by multi-obj reinf learning for polypharm"
    --> baoilleach #Shef2019 This work is related to adenosine receptors. Explains GPCRs. 33% of drugs target these. Adenose receptors distributed throughout the body. Involved in stroke, psychosis, asthma, cancer, inflammation.
        --> baoilleach #Shef2019 When generating de novo molecules, the generated cmpds should have high affinity for specific targets and low for others, should have large diversity to explore the space, but also, to some extent, similar to the known ligands (applicability domain).
            --> baoilleach #Shef2019 Original DrugEx was applied to a single target (A2A receptor). This update for multiple targets (A1, A2A, A2B, A3).
                --> baoilleach #Shef2019 Convert to SMILES first as our generator is a SMILES-based  RNN. Generation of tokens (vocabulary). Dataset 1 is based on 1M molecules from ZINC with drug-like properties; used to learn SMILES grammar. Dataset 2 is a GPCR set from ChEMBL, pCHEMBL>6.5 for active.
                    --> baoilleach #Shef2019 Model trained under deep reinforcement learning framework. Trained on ZINC, and fine-tuned with GPCR set. Environment-predictor was the reward function. (Liu et al 2019)
                        --> baoilleach #Shef2019 Most of the generated SMILES were valid. Describes the action of a RNN.
                            --> baoilleach #Shef2019 Shows QSAR comparison of RF, SVM, KNN (+ others) on adenosine receptors, RF does the best and so it was chosen as the predictor for the reinforcement learning. The reward function gives value for on-target versus off-target versus invalid.
                                --> baoilleach #Shef2019 Unexpected result was that molecules were not diverse - kept generating the same small no. of molecules. So we added in an exploration strategy, like in a GA, a mutation strategy ('crossover net' and 'mutation net').
                                    --> baoilleach #Shef2019 Compared to ORGANIC and REINVENT. Our algorithm can cover most of the space, but the other two not. Also ours are drug-like but not the others. Shows examples of multi-target vs single-target molecules.
                                        --> baoilleach [Ed] My own suggestion: don't penalise structures based on invalid valence. I think that's too steep a cliff. If they can't be converted to a molecule that's a problem, but even if a weird valence at one point, the rest of the molecule might be fine.
                                            --> marco_foscato @baoilleach And the definition of "invalid valence" is typically biased by  organic drug-like chemistry. This hampers application of the methods in other fields like for organometallic catalysis, where a lot of chemistry is done outside such oversimplified "valid valence" rules.
                                        --> rguha @baoilleach What does it mean to say "most of the space"? Do they have a predefined space they are referring to? Or is this referring to diversity? #Shef2019
                                            --> baoilleach @rguha There was a predefined space - I missed exactly.
                                --> WendyAnneWarr @baoilleach @RSC_CICAG You two are doing a great job. Not much point in my duplicating your tweets so I am adding generalisations and useful URLs
                                    --> baoilleach @WendyAnneWarr @RSC_CICAG Thanks, but now I can't ever take a break!
                                        --> WendyAnneWarr @baoilleach @RSC_CICAG 😀
                                            --> RSC_CICAG @WendyAnneWarr @baoilleach Maybe we can take sessions in turns. I'm exhausted already!
                                                --> baoilleach @RSC_CICAG @WendyAnneWarr Does that mean I win the new Peter Willett award you mentioned? It was for tweeting right?
                                                    --> RSC_CICAG @baoilleach @WendyAnneWarr I'm not sure Peter would sanction such an award...

WendyAnneWarr #shef2019 10.26434/chemrxiv.7436789.v2

WendyAnneWarr #shef2019 Liu work https://t.co/8tM3k8qJJP

WendyAnneWarr #shef2019 Liu. Two deep neural networks interplay under reinforcement learning framework. RNN as agent and multi task fully connected DNN as the environment.

WendyAnneWarr #shef2019 Liu work done at https://t.co/qHB2ykq8Ug
    --> CDDLeiden @WendyAnneWarr Indeed, he is currently at the Computational Drug Discovery group. More details about what we do can be found here: https://t.co/MQWDEZpxUi

cthoytp First link to code at #shef2019 from Xuhan Liu at https://t.co/n6zk2xnVfQ https://t.co/FcDOLHKuyy

WendyAnneWarr #shef2019 RNN and SMILES also published (by other teams in https://t.co/M5MnwAprWZ Liu’s ‘citation for ORGANIC is wrong. I will search

iwatobipen I'am reading hash tag of #shef2019 there are many exciting tweets. Thanks for tweeting ;)
    --> iwatobipen Of course my colleague participate the meeting!
    --> macinchem @iwatobipen Not sure all the tweets with #Shef2019 are showing up?
        --> iwatobipen @macinchem Yes I am not sure. But I could enjoy ;)
    --> wpwalters @iwatobipen Agreed, thanks @baoilleach and @WendyAnneWarr !

WendyAnneWarr #shef2019 first after lunch is Henriette Willems of ALBORADA Univ Cambridge. Case study of academic drug discovery enabled by virtual screening.

RSC_CICAG Chair of this session is @gmm from @MgmsUpdates. #Shef2019

RSC_CICAG First speaker of the second session is Henriette Willems from the University of Cambridge: A case study of academic drug discovery enabled by virtual screening. Abstract here: https://t.co/HgplfPOrKf #Shef2019
    --> RSC_CICAG The ALBORADA Drug Discovery Institute: https://t.co/TiFsZkuJYe #Shef2019
        --> RSC_CICAG Virtual Screening of PI5P4K lipid kinases. Some background reading on this target here: https://t.co/v2vdNuh8a4 #Shef2019
            --> RSC_CICAG 6148 compounds from GOLD and MOE VS. Top 1000s selected from three scores. Apply property cut-offs, CNS-likeness #Shef2019
                --> RSC_CICAG Hit rate from HTS of 175k, with 113k of CNS lead-like compounds. Gave 0.03% hit rate in orthogonal assay. #Shef2019
                    --> RSC_CICAG "Virtual screening is cheaper and faster, and can work very well. HTS tends to give more hits." #Shef2019

baoilleach #Shef2019 Henriette Willems from ALBORADA institute, an alzheimer's research uk funded institute with sister organisations in Oxford and UCL.
    --> baoilleach #Shef2019 Interested in enhancing cellular protein clearance. PI5P4K lipid kinases are the target. Lipid kinases are unusual, low homology with other kinases. Not shown on the "kinase map" - usually included as separate bits at the bottom.
        --> baoilleach #Shef2019 Are these good targets for VS? Xtals in PDB: good. Some structures have ATP/GTP bound, so we know the active site. No structure with inhibitors: bad. No potent known ligands. Some reported ligands were not active in our assays. Part of G-loop missing.
            --> baoilleach #Shef2019 Our plan was to filter down structures from vendors, dock them and select about 960 cmpds. Why 960? 3 plates. Ferreira et al reports 0.8-2.5% hit rates. If we target around 10 hits, we need to test around 1000 cmpds.
                --> baoilleach #Shef2019 Shows Knime workflow. Used PAINS (from RDKit) and reactive group filters (Brenk et al, 2008). VS by pharmacophore (MOE) and then docking (GOLD). But pharmacophore either give too many or too few hits, wanted ~5000. So added a quick+dirty GOLD docking run first.
                    --> baoilleach #Shef2019 Redocked with more expensive protocol, GOLD (top 1000 by ChemScore, top 1000 by ASP), Glide (top 1K by Glidescore). Then ChemAxon filtering for drug-like properties and screened exptally [missed % hit rate].
                        --> baoilleach #Shef2019 Retrospective analysis showed that GOLD did pretty well on its own, but got all the hits when combined with the full protocol.
                            --> baoilleach #Shef2019 Restarted with bigger vendor library, 1.5M. Filter for lead-like, ChemAxon protomers and stereomers, create 3D structures with RDKit, then do the same as before. This time we selected more of the docking hits that had the desired MW - "tiered selection".
                                --> baoilleach #Shef2019 850 cmpds purchased. Hits rates around 1.2, 1.4%. "So that worked pretty well." After hit follow up were able to get down to 10nM for PI5P4Kalpha, and 13nM for gamma. So 5-fold potency increase through cmpd purchasing. "SAR by catalog"
                                    --> baoilleach #Shef2019 Describes the ADMET properties of the lead molecules. Selective? On a kinase screen, one lead had only 2 hits, the other only 4 hits. Got xtal structures; both do indeed bind to the ATP pocket.
                                        --> baoilleach #Shef2019 Now talking about HTS. We recently purchased a HTS library of 175K cmpds. Hit rate of 0.03% after filtering down. Similarly for a diversity set. Compares vHTS vs HTS. HTS about 10 times more expensive; hit rate lower but got more hits.

WendyAnneWarr #shef2019 willems   https://t.co/XFQKXn9hyC

ALBORADA_DDI Our computational chemist Henriette is presenting at #shef2019 on using #VirtualScreening to aid #DrugDiscovery 💻💊 https://t.co/jESRbfZXgY

WendyAnneWarr #shef2019 willems explains functions of PI5P4K kinases . No structs with ligands in PDB. virtual  screening plan outlined. 960 compounds purchased and and Screened

WendyAnneWarr #shef2019 willems useD @knime wotkflow to do usual filters such asPAINS. @RDKit_org and MOE

WendyAnneWarr #shef2019 willems’ pharmacophores gavex toomany it too fewhits

OxDrugDes Jerome (@jezwicker) is looking forward to the Eighth Joint Sheffield Conference on #Chemoinformatics (#shef2019).  Catch up with him there. https://t.co/m5HBDGh3s8

WendyAnneWarr #shef2019 so willems tried a position from docking (GOLD) and MOE . Tried @chemaxon tools too.

WendyAnneWarr #shef2019 willems discusses restoring. now tried bigger vendor library. @chemaxon @knime @RDKit_org . Now getting compounds that are too big

WendyAnneWarr #shef2019 So willems tried tiered selection. 850 compounds selected

WendyAnneWarr #shef2019 willems “analog by catalog”

WendyAnneWarr #shef2019 Willems found 10 “active” compounds for PI5PK alpha and 12 for gamma. Got Xal structures

WendyAnneWarr #shef2019 willems. (Active means below 10 nM) VHTS cheaper and faster and can work very well but HTS gives more hits.

RSC_CICAG Next speaker is Andrea Morger from Charité Universitätsmedizin Berlin on A case study of toxicity prediction including reliability and confidence estimation. Abstract here: https://t.co/jsrFosjdzH #Shef2019
    --> RSC_CICAG In silico toxicity prediction will reduce and replace animal testing to filter out harmful compounds sooner - Morger #Shef2019
        --> RSC_CICAG KnowTox project between Charité and BASF using in-house data #Shef2019
            --> RSC_CICAG Method uses conformal predictions from Norinder: Introducing Conformal Prediction in Predictive Modeling. A Transparent and Flexible Alternative to Applicability Domain Determination. https://t.co/67L2Ves2N1 #Shef2019
                --> RSC_CICAG Case study with two in-house triazoles as potential fungicides. #Shef2019
                    --> RSC_CICAG Aromatase identified as off-target predicted using conformal prediction. Well-known off-target. #Shef2019
                        --> RSC_CICAG Identified two structural alerts: halogenated benzene and acyclic bivalent sulfur moiety for non-genotoxic carcinogenicity and hepatotoxicity, respectively #Shef2019
                            --> RSC_CICAG Conclusion: integration of different toxicity prediction methods, conformal prediction, and a case study of toxicity prediction. #Shef2019

WendyAnneWarr #shef2019 Correction Willems did not get xal structures for  ALL the hits. The HTS was done on a 175,000 compound library

baoilleach #Shef2019 Andrea Morger from Charite Berlin on toxicity prediction including confidence estimation
    --> baoilleach #Shef2019 Motivation is to reduce animal testing, especially by applying early in the development of new drugs.
        --> baoilleach #Shef2019 KnowTox Project, a collab with BASF. Support toxicologists. Main datasource is ToxCast, provided by EPA (plus Tox21). When cleaned it has 8390 cmpds, 985 endpoints, spare matrix. Validated on in-house dataset @BASF.
            --> baoilleach @BASF #Shef2019 Three methods were used for risk assessment. 1. Read-across - similar mols have similar toxic effects. 2. Structural alerts - substructures assoc with tox. We highlight, not filter. 3. Conformal prediction - statistical framework always valid at a given signif level.
                --> baoilleach @BASF #Shef2019 Normal ML you don't know if you can trust it. That's why we moved to conformal prediction (CP). Explains conformal prediction (Norinder et al JCIM 2015). The p-values referred to are not the same as the statistical p-values,
                    --> baoilleach @BASF #Shef2019 [Note to self: properly learn what conformal prediction is] Validity: CP models always valid. We also consider informational efficiency (are they useful?), as well as accuracy.
                        --> baoilleach @BASF #Shef2019 Tested on in-house dataset on AA (androgen receptor antagonism) - related to endocrine disruption. Some adaptions needed to method as two different datasets, and environment chemicals vs pharmaceutical chemicals.
                            --> baoilleach @BASF #Shef2019 New model includes information from nearest nbrs, a normaliser regression model. Most of the nearest nbrs are inactive so we needed to balance the training set. At low significance the model has a low error rate/efficiency (good).
                                --> baoilleach @BASF #Shef2019 With the validated model, we find that the accuracy on the ToxCast endpoints is 79-95% correct class labels.
                                    --> baoilleach @BASF #Shef2019 Describes case study of two in-house triazoles. The single-class predictions at signif level epsilon of 0.2. Aromatases and nuclear receptors were predicted (among others). These make sense based on literature on androgen receptors and known homology of aromatase,
                                        --> baoilleach @BASF #Shef2019 Compares to structural alerts and read-across (similarity search) results. Describes in-house assays to determine reason for toxicity. A bit tricky to figure out the exact reason. [My paraphrase]
                                            --> baoilleach @BASF #Shef2019 Question about weighting false positives and false negatives differently - one kills the animal and you didn't expect, one doesn't kill the animal but you expected.
                    --> WendyAnneWarr @baoilleach @BASF #shef2019 Norinder paper is 2014 not 2015 isn’t it?

WendyAnneWarr #shef2019 next Andrea Morger of Charite Berlin and collaborators at BASF. Case study of tox prediction including reliability and confidence estimation.

WendyAnneWarr #shef2019 Morger cleaned using MACCS fingerprints etc.

WendyAnneWarr #shef2019 for applicability domain used conformal prediction. @JCIM_ACS 2014 54 1596 and toxicology research 2017 6 (1) 73

WendyAnneWarr #shef2019 CP prediction endpoint was androgen receptor antagonism (AA) in house  set

WendyAnneWarr #shef2019 Morger included normaliser model for classification in CP. include info from nearest neighbours. Balance training set. Over conservative validity at significance less than 0.3. Concept validated. Now appl to other ToxCast endpoints

WendyAnneWarr #shef2019 Morger case study on two in-house triazoles, failed fungicides

WendyAnneWarr #shef2019 CYP19 aromatase is known off-target fungicides. Two other Nuclear receptor examples discussed.

WendyAnneWarr #shef2019 Morger reports similarity search on the triazoles. And read across propiconazole  case study

RSC_CICAG Last speaker of this session is Al Dossetter from MedChemica with Accelerating multiple medicinal chemistry projects using Artificial Intelligence (AI): A review from the past 8 years of real world examples. Abstract: https://t.co/bKBJg4rZew #Shef2019 @al_dossetter @MedChemica
    --> RSC_CICAG Loving this hashtag from @MedChemica #BucketListPapers. The 100 papers you need to read before you die! https://t.co/CJ4oLdINUz #Shef2019
        --> RSC_CICAG Can we accelerate medicinal chemistry by augmenting the chemist with Big Data and artificial intelligence? https://t.co/Fm3vr2zQrw #Shef2019
            --> RSC_CICAG The company has been working on Explainable AI using Matched Molecular Pair Analysis to be able to map back to original transforms and metadata to explain the predictions. #Shef2019
                --> RSC_CICAG Turbocharging Matched Molecular Pair Analysis: Optimizing the Identification and Analysis of Pairs: https://t.co/5YfCLCR8LW #Shef2019
                    --> RSC_CICAG The rise of the intelligent machines in drug hunting? https://t.co/9WUtpFIMeU #Shef2019 #FreeAccess #OpenAccess
                        --> RSC_CICAG Learning Medicinal Chemistry Absorption, Distribution, Metabolism, Excretion, and Toxicity (ADMET) Rules from Cross-Company Matched Molecular Pairs Analysis (MMPA). https://t.co/ywtp5sJI74 #Shef2019
                            --> RSC_CICAG Original Matched Molecular Pair Analysis paper: Matched Molecular Pairs as a Guide in the Optimization of Pharmaceutical Properties; a Study of Aqueous Solubility, Plasma Protein Binding and Oral Exposure https://t.co/fw9EutLNgF #Shef2019
    --> RSC_CICAG Here is a recent version of Al's talk: https://t.co/GwVhSd4Z2R #Shef2019
    --> RSC_CICAG The company offers two software platforms for enterprise and an online version for small companies. #Shef2019

baoilleach #Shef2019 Al Dossetter (MedChemica) on accelerating multiple medicinal chemistry projects using artifical intelligence (AI)
    --> baoilleach #Shef2019 Highlights #bucketlistpapers on twitter - 100 papers you need to read before you die.
        --> baoilleach #Shef2019 MedChemica founded in 2012 on extracting SAR knowledge from med chem data. The idea is to reduce the no of cmpds made to get to a lead.
            --> baoilleach #Shef2019 "Science as a Service" - have consulted on most aspects of the pipeline. Now for the AI...the quality of an AI model depends onthe no. of times the machine can learn from success and failure. Alpha-Go is fast to learn because computers can play each other.
                --> baoilleach #Shef2019 ...different in drug discovery. Recently came across the term "Explainable AI" - collaboration with computers, can drill down into why a suggestion was made. MedChemica's predictions based on MMPA. Encodes the environment at several levels.
                    --> baoilleach #Shef2019 Lots of suggestions made, but which are the gems? The good stuff? Have derived "med chem rules" based on cases where there is sufficient data. Transformations were things stay the same; not boring, v useful, other properties may improve, bioisostere.
                        --> baoilleach #Shef2019 The system (MCPAIRS) can be plumbed into an existing system. Just incremently updates with new molecules as the data comes in. Can do knowledge sharing with multiple companies with MMPA ("no need for blockchain!").
                            --> baoilleach #Shef2019 Shows example where ArH to ArSO2Me increases solubility but decreases logP. Later work showed that this depends on the environment, not always true.
                                --> baoilleach #Shef2019 Example of glucokinase activators, fix hERG while maintaining potency. A counter intuitive rule that goes against the dogma...[missed details]
                                    --> baoilleach #Shef2019 Cyclopropyl group can influence packing and so improve solubility.  Converted 4-F-Ph to 3,4-diOMe-Ph: med chemists said would make clearance worse, but was the opposite.
                                        --> baoilleach #Shef2019 "I'll replace a Cl with a Nitrile, classical med chemistry, it'll make things better". But it actually made it worse.
                                            --> baoilleach #Shef2019 [Note to self: another structure changed to cyclic NCCOC ]
                                                --> baoilleach #Shef2019 Another example where a nitrgoen was taken out of bicyclic system ("chemists love putting nitrogens in") and got an inc in potency.
                                                    --> baoilleach #Shef2019 Now talking about transparency/understandability wrt different methods. How these models work really well for chemists is if you highlight the parts of the molecules that matter.

WendyAnneWarr #shef2019 Similar set of slides is on SlideShare, says Dossetter.

WendyAnneWarr #shef2019 Dossetter https://t.co/qqVFoZvrsD

WendyAnneWarr #shef2019 Griffin et al. DDT 2018.

WendyAnneWarr #shef2019 Leach et al. @JCIM_ACS 2017 57 2424. Explainable AI not black box. Griffin et al. Future med chem 2009 1 405

WendyAnneWarr #shef2019. J med chem 2018 61 3277 Dossetter mentions significance of SMIRKS

WendyAnneWarr #shef2019 waring et al med Chem comm hERG example

WendyAnneWarr #shef2019 Crawford j med chem 2012 55 8827. Cathepsin K inhibitors for OE. Also Dossetter bioorg med chem 2010 4405

WendyAnneWarr #shef2019 Ghrelin inverse agonists : CNS target. McCoull med chem commun  2013 4 456 .

WendyAnneWarr #shef2019 Thomson et al j med chem 2015 58 9309

WendyAnneWarr #shef2019 Cancer Research UK t-Bu metabolism issue. Recent results he cannot report. And another examples from Roche and Genentech (success studies) Time saved. Fewer compounds made

WendyAnneWarr #shef2019 Dossetter. What does understandability interpretability mean? Need to increase trust in AI. Show chemists the five nearest neighbours, what to make next etc. Chemist wants to see structures

WendyAnneWarr #shef2019 Dossetter @MedChemica take-homes. MMPA a form of AI that builds useful knowledge databases, finds rules, highlights gems. Medchemists need quick and easy UI. He gave 9 project examples, some going against dogma.

RSC_CICAG First reference to Snakes on a Plane from @al_dossetter in the context of fluorines in a benzene ring. #Shef2019 https://t.co/0wRkfj4HjI

    --> baoilleach @RSC_CICAG @al_dossetter #Shef2019 Haven't seen it and now you've spoilered it. :-)

RSC_CICAG “The company who invests in data is the company that will win at AI.” Al Dossetter. #Shef2019 @al_dossetter

WendyAnneWarr #shef2019 after tea we will have methods papers. First will be Shuzhe Wang of ETH Zurich on improving RDKit’s conformer generator to sample macro cycles. Co-author Sereina Riniker. But I am probably off to get my room keys rather than taking tea.

baoilleach #Shef2019 Shuzhe Wang on improvements to macrocycles conformer generation using RDKit
    --> baoilleach #Shef2019 We are interested in this because we are interested in membrane permeability of cyclic peptides. And we need structures for MD simulations, lots in parallel, for use with Markov Models.
        --> baoilleach #Shef2019 Need a set of conformers. Currently use xtal structures. Was generating with MD - not v diverse sometimes. Also don't have xtal structures sometimes.
            --> baoilleach #Shef2019 Lots of recent books and articles on macrocycles in drug discovery. Resurgence happening. Problem with conformers is conformation space too large, cannot sample effectively.
                --> baoilleach #Shef2019 Method is based on RDKit's ETKDG. Create atom pairs for bound matrices. Describes full workflow with multiple steps.
                    --> baoilleach #Shef2019 Initial chemically diverse macrocycles obtained from four sources: CSD, MAC10, BIRD and Prime (Paul Hawkins). Measure of flatness based on PCA; third PCA is the flatness measure. Also a roundness measure, eccentricity.
                        --> baoilleach #Shef2019 The default settings project the high dimension into 3D by choosing that those with the highest eigenvalues. This tends to lead to rings that are maximally not flat. Better is to choose the eigenvalue randomly. [I think?]
                            --> baoilleach #Shef2019 Shows graph showing that using random coords improves sampling of all non-flat macrocycles, and doesn't change much the flat ones. So it's a good idea.
                                --> baoilleach #Shef2019 For large rings (>10 atoms?), use SMARTS patterns the same as for non-ring atoms. Only use particular ring atom patterns for smaller rings which will have preferences.
                                    --> baoilleach #Shef2019 Use ring eccenticity (ellipse) to derive bounds for rings based on estimated perimeter of ring. Also use eccenticity as a descriptor.
                                        --> baoilleach #Shef2019 Using eccenticity further biases towards flat cycles. Is detrimental to those with non-flat reference (xtal) structures but v. useful for flat ones.
                                            --> baoilleach #Shef2019 Shows example of structures that are v flat and v squashed. This is due to intramolecular H bonds that keep them that way. How to encode such distant interactions into the conformer generator?
                                                --> baoilleach #Shef2019 Custom Pairwise Coulombic Interactions - specified by the user - can be attractive or repulsive. Used to describe the interaction across the ring manually. The best results are v. similar to the xtal structure; the best structures are all shifted to lower RMSD.
                                                    --> baoilleach #Shef2019 In summary works very well. You can use it right now. Will be part of RDKit but can use the docker image on our lab's repository.

RSC_CICAG Next up is @hjuinj from ETH Zürich on Improving RDKit’s conformer generator to sample macrocycles. #Shef2019
    --> RSC_CICAG Lots of research in macrocycles in drug discovery recently: Macrocycles in new drug discovery https://t.co/U0a1NGOq0I #Shef2019
        --> RSC_CICAG Use ring RMSD just on the macrocycle to measure performance. #Shef2019
            --> RSC_CICAG Shuzhe’s abstract is here: https://t.co/kaDPC2jABp #Shef2019
                --> RSC_CICAG Use random coordinates to start from to avoid sampling only flat structures. #Shef2019
                    --> RSC_CICAG The eccentricity scale used to measure how round the macrocycle is. #Shef2019
                        --> RSC_CICAG Sourcecode to try out from @hjuinj is here: https://t.co/Hs9cMcxUae #Shef2019

WendyAnneWarr #shef2019 Wang outlines workflow in  @RDKit_org ETKDG

WendyAnneWarr #shef2019 ETKDG described in Riniker et al. @JCIM_ACS 2015 55 2562. Wang’s improved conformer generator will be added to @RDKit_org

WendyAnneWarr #shef2019 Witek, Wang et al. @JCIM_ACS 2018.
    --> WendyAnneWarr #shef2019 https://t.co/mLbtvQkjgL Riniker and Landrum paper @RDKit_org

baoilleach #Shef2019 Paolo Tosco (Cresset) on live 3D pose generation from 2D sketches
    --> baoilleach #Shef2019 The origin was "it would be nice to be able to draw a mol in a 2D sketcher and see it grow in the 3D view in the binding site".
        --> baoilleach #Shef2019 "How hard can it be?" - generate 3D structure, dock it into site, or align against a ref with 3D fields and shape, using the protein as excluded volume. Then "grow3D".
            --> baoilleach #Shef2019 Notes that he has RDKit logo on every slide as it quite heavily relies on RDKit.
                --> baoilleach #Shef2019 Generate a 3D structure? Or grow3D? Shows flowchart, e.g. has the user increased the no of fragments? Has the total no of atoms increased?
                    --> baoilleach #Shef2019 Grow3D. First thing to do is map current frags to previous frags. If it's a new frag, don't care about it until connected to existing one. Shows example of adding methyl, now phenyl to that. In latter case need to try all 3 of methyl's Hs, and then sample torsions.
                        --> baoilleach #Shef2019 Symmetries! Need to all symmetry equivalent positions, no matter which side the user draws it on.
                            --> baoilleach #Shef2019 What about the chemically-invalid states we pass through when drawing, e.g. a sulfone we might have a hexavalent carbon before turning carbons to oxygens. grow3D waits for sensible chemistry.
                                --> baoilleach #Shef2019 Case study from paper on scaffold hopping in CHK1 drug discovery. Related to DNA-damaging anti-cancer agents, sensitising tumour cells to action of drugs.
                                    --> baoilleach #Shef2019 Refers to an intermolecular thiophene S to O= interaction. [Ed: Is this a thing?]
                                        --> baoilleach #Shef2019 Let's pretend we are the AZ chemists and try these modifications live in the 3D sketcher. Involves ring closure, but works fine. May need to force realignment to improve the structure placement from time to time.
                                            --> baoilleach #Shef2019 Shows example of much more extensive morphing. Finally shows the full flow chart which is quite large for determining when to grow or generate.

RSC_CICAG Next speaker is Paolo Tosco from @cressetgroup talking on Design in 2D, model in 3D: live 3D pose generation from 2D sketches. Abstract here: https://t.co/P1nE6jI9G6 #Shef2019
    --> RSC_CICAG Reference paper used for grow3D application: Adventures in Scaffold Morphing: Discovery of Fused Ring Heterocyclic Checkpoint Kinase 1 (CHK1) Inhibitors https://t.co/NvrfaAqUMV #Shef2019
        --> RSC_CICAG grow3D is an algortihm which aims at generating sensible 3D poses in response to a 2D chemical sketcher. #Shef2019

WendyAnneWarr #shef2019 thanks to Noel for taking over the @cressetgroup Cresset talk. While searching for literature and browsing Cresset website the WiFi network dropped me yet again. Do we have bandwidth problems?

baoilleach #Shef2019 Lee Steinberg on Topological data analysis of conformational space
    --> baoilleach #Shef2019 How do we describe conformational space as chemists? The conformational space of a molecule is the underlying structure on which the energy landscape is based. Not going to talk too much about energies here.
        --> baoilleach #Shef2019 Conformational space well-understood for alanine dipeptide. Toroidal space. A torus cannot be embedded in 3D space without stretching.
            --> baoilleach #Shef2019 A mathematical description of conf space. Remove symmetries. Align. Calculate metric.
                --> baoilleach #Shef2019 One way is to align each conformer to a reference (Euclidean). Vectorise each conformer. Another way is RMSD (Procrustes), find optimal alignment between all pairs of conformers, slow but easy to take into a/c other symmetries.
                    --> baoilleach #Shef2019 Topological data analysis (TDA). It doesn't tell you the answer to the question - just tells you how you should be looking at it in the first place. Where are the holes in the data? Has been used as a descriptor for pore geometries.
                        --> baoilleach #Shef2019 Now explaining simplicial complexes, a combinatorial approach to top spaces. Our data pts act as vertices - relationships between them determine lines/triangles/tetrahedra.
                            --> baoilleach #Shef2019 Going to count holes. Calculating the Betti Numbers of the object. n-dimensional holes. A 1-D dimensional hole has as its boundary a ...sphere??? a line?? [One of those]
                                --> baoilleach #Shef2019 Topological features that live for a while are of interest. Shows example of distance fn of regular hexagon.
                                    --> baoilleach #Shef2019 *literal mike drop*
                                        --> baoilleach #Shef2019 Generate conformers with RDKit. Shows results for dialanine. Describing the effect of symmetry. Shows paper describing cyclooctane as the union of a torus and a klein bottle.
                                            --> baoilleach #Shef2019 [Correction: sphere and klein bottle] We need it to be a manifold so we remove all non-manifold points with local PCA. Remove them from the dataset, and cluster the remaining data. Look at the persistent features in each cluster.

WendyAnneWarr #shef2019 last speaker today is Lee Steinberg of Southampton on topological data analysis of conformational space. Lee explains molecular energy landscapes.

CDDVault [JUNE 17-19] Stop by our exhibit table and have a chat with Renate Baker and Susana Tomasio from CDD about your discovery informatics workflows.


#CDD #ElectronicLabNotebook #DrugDiscovery #ELN #MedicinalChemistry #Biologics #shef2019
    --> Renate100100 @CDDVault @WendyAnneWarr Thanks for retweeting, @WendyAnneWarr . It was nice meeting you last night

RSC_CICAG Last speaker today is Lee Steinberg from University of Southampton on Topological data analysis of conformational space. Abstract here: https://t.co/VHF9KVgV3a #Shef2019
    --> RSC_CICAG In case you wanted to know more about Topological Data Analysis: https://t.co/D24l8XcWHk #Shef2019
        --> RSC_CICAG Generate as diverse a set of conformers as possible obeying chemical rules,allowing all degrees of freedom to fluctuate. #Shef2019
            --> RSC_CICAG Learning a lot about algebraic topology today: https://t.co/4wcFl1AztZ #Shef2019
                --> RSC_CICAG Persistent homology allows us to characterize conformational space. Torsional degrees of freedom most prominent. RMSD metric best. #Shef2019

WendyAnneWarr #shef2019 I remember the donut and the coffee cup from the AI3SD meeting at Alderley Park. My report should soon be out on the Web. Some useful refs in there. Alas the. GSK talk had to be simplified : unpublished work

RSC_CICAG End of day one at #Shef2019. Now for the @MgmsUpdates AGM followed by poster presentations and a drinks reception sponsored by @nmsoftware.

WendyAnneWarr #shef2019 preprint of Lee Steinberg work should soon appear @profechem  @LeeSteinberg

WendyAnneWarr #shef2019 only thing between me and dinner/drinkies is now the MGMS AGM
    --> WendyAnneWarr @MgmsUpdates

MgmsUpdates Thank you everyone at #Shef2019 who stayed for the MGMS AGM. And welcome to all our new MGMS members!  :-)

RSC_CICAG First speaker today is Louis Bellmann from Universität Hamburg talking on Connected subgraph fingerprint: from theory to applications. Louis' abstract is here: https://t.co/OLw4OxpaTn #Shef2019
    --> RSC_CICAG Building subgraphs through the current subgraph, candidate atoms, and forbidden atoms that should not be considered. Similar to methods in pattern mining and combinatorics. #Shef2019
        --> RSC_CICAG The method is good at scaffold recognition for early enrichment in virtual screening. #Shef2019
            --> RSC_CICAG Moving on to fragment spaces: do as much work on fragments themselves and not focus too much on enumeration. Much more efficient encoding the space without explicit enumeration. #Shef2019
                --> RSC_CICAG Highlighting the work of @dr_greg_landrum and Sereina Riniker on fingerprint benchmarking: Open-source platform to benchmark fingerprints for ligand-based virtual screening.
https://t.co/ZJ9kAXMFgp #Shef2019
                    --> RSC_CICAG Conclusions:
 * CSFP method considers all structural features;
 * good for early enrichment, scaffold hopping
 * pattern recognition (MCS/clustering);
 * application to fragment spaces.
                        --> RSC_CICAG Great talk on fingerprints! #Shef2019 https://t.co/hvwIJSE1rq

baoilleach #Shef2019 Louis Bellman on connected subgraph fingerprint
    --> baoilleach #Shef2019 Describing topological fps such as ECFP (circular subs) and topological torsions (path-like structures). But they miss other structural features such as a sulfone substructure as not circular or a path.
        --> baoilleach #Shef2019 Our solution is to encode all structural features: CSFP (connected subgraph fp). First enumerate subgraphs, then unify, and finally collect.
            --> baoilleach #Shef2019 Enumeration should be exhaustive but you don't want to count s.t. twice. One way is to grow subgraphs by adding nbrs. But how to avoid duplicates? We add nbrs in a canonical order, and have candidates and forbidden nodes [missed details]
                --> baoilleach #Shef2019 To create the identifier we traverse the atoms of the subgraph in a canonical order, like in CANGEN.
                    --> baoilleach #Shef2019 Describes three variations that capture diff atom properties, e.g. ring membership or not. These fps are shown on a spectrum from generic->sensitive with TT close to generic, and ECFP close to sensitive.
                        --> baoilleach #Shef2019 Performance was evaluated using the benchmarking platform from @dr_greg_landrum and Riniker, just the MUV subset.
                            --> baoilleach @dr_greg_landrum #Shef2019 Notes that the subgraph sizes are bounded, e.g. CSFP2.5 would have upper/lower bounds of 5/2. Otherwise there would be far too many. CSFP good for early enrichment. tCSFP (topological...) good for AUC,
                                --> baoilleach @dr_greg_landrum #Shef2019 Shows example of scaffold recog for early enrichment. The CSFP2.3 does twice as well as ECFP2. The iCSFP (independent...) does even better. In the latter case, the fragment is encoded as if it is not embedded in a larger molecule.
                                    --> baoilleach @dr_greg_landrum #Shef2019 Now showing applic to fragment spaces. Can be used for subset relation as the fingerprint bits of fragments are found in superstructures of that molecule. By design, this is true for CSFP, TT but not for ECFP.
                        --> WendyAnneWarr @baoilleach CSFP more sensitive than ECFP

WendyAnneWarr #shef2019 day two. First three papers  are on graph algorithms. First Louis Bellmann who works for Matthias Rarey at Hamburg

WendyAnneWarr #shef2019 Bellmann presents connected subgraph fingerprint CSFP

WendyAnneWarr #shef2019 another acronym: connected subgraph enumeration strategy CONSENSUS

WendyAnneWarr #shef2019 riniker and Landrum also showed topological torsion better for AUC. Bellmann says generic fingerprints better for enrichment factor measurement

WendyAnneWarr #shef2019 Riniker and Landrum J Cheminf 2015

WendyAnneWarr #shef2019 independent version of. CSFP does better than standard version for scaffold recognitionin early enrichment. iCSFP is in middle between CSFP and tCSFP (t for topological)

WendyAnneWarr #shef2019 CSFP is compatible with combinatorial fragment spaces. Opens  way to topological search in chemical space

cthoytp More code being shared at #shef2019 for connected subgraph fingerprints from Louis Bellmann https://t.co/8UoKHLv6Wy

    --> cthoytp May have spoke too soon - the code isn’t actually openly available
    --> WendyAnneWarr @cthoytp Python code on Rarey’s Hamburg ZBH website #shef2019

RSC_CICAG Next up in this session on graph algorithms - hosted by the legend that is Peter Willett - is Evgeni Grazhdankin University of Helsinki on Homology modelling with probabilistic restraint graphs. Abstract here: https://t.co/3Xu1PY338t #Shef2019
    --> RSC_CICAG The speaker is TIME person of the year 2006! (aren't we all...!) https://t.co/fyEFS9l5Au #Shef2019 https://t.co/Joa5oErA2y

        --> RSC_CICAG Modelling structures: de novo - costly; threading - inaccurate for drug discovery; homology modelling! #Shef2019
            --> RSC_CICAG State of the art is MODELLER: https://t.co/Z54d1ybu5B #Shef2019
                --> RSC_CICAG The main aim: restraints optimisation. Create template and alignment -> add restraints from donor and acceptor atom types as dynamic and stereochemical restraints as static. #Shef2019
                    --> RSC_CICAG Started off with MSc thesis: https://t.co/7JJVuynrnu #Shef2019
                        --> RSC_CICAG First case study on rhodopsin to hB2AR #Shef2019
                            --> RSC_CICAG Now talking on how to effectively parallelise the code over nested iterations. #Shef2019
                                --> RSC_CICAG Future:
 * apply better restraint scoring
 * interconditional restraint properties
 * mine homology model databases
 * expand to non-HB atom types (trivial)
 * waters and ligands (hard)
                                    --> RSC_CICAG Great talk on exploring protein conformations. #Shef2019 https://t.co/q9G1lWArCt

baoilleach #Shef2019 Evgeni Grazhdankin on Homology modelling with probabilistic restraint graphs
    --> baoilleach #Shef2019 "Time person of the 2006" ("YOU") :-)
        --> baoilleach #Shef2019 Why do we need homology modelling? Lots of PDB structures now, but not so much for GPCRs. Threading to template based on sequence identity.
            --> baoilleach #Shef2019 Threading inaccurate for drug discovery. State of the art is MODELLER. Quite stable, partly open source, is relatively fast. Does not create long-range restraints, misses many favourable H bonds and no ligands or water.
                --> baoilleach #Shef2019 Solutions: optimise side chains, H bond networks, loops, sampling from conf space, and using open source tools. We are trying to place favourable distance constraints that lead to favourable models.
                    --> baoilleach #Shef2019 Static constraints are used for stereochemistry, but dynamic for modeller homology and custom distance. Implemented in Python, Postgres, R and C. Described in several Masters theses.
                        --> baoilleach #Shef2019 Starting pt: given a perfect set of dist restraints, an arbitrarily good model can be produced.  The challenges are the a priori sampling of restraints. Optimising them, and ....
                            --> baoilleach #Shef2019 Restraints are described with a simple Gaussian. The atoms are the nodes and the distance constraints are the edges. Can have multiedges [missed why?]
                                --> baoilleach #Shef2019 For H bonds, we have an initial pool of restraints between HB acceptors and donors, a random sample that disregards distance!
                                    --> baoilleach #Shef2019 Model building is iterative. Sample vertices, parameterize and sample a graph to build a model. The model is evaluated, and the probabilities are updated if reasonable.
                                        --> baoilleach #Shef2019 Once the model probs converge, the iterative procedure is exited.
                                            --> baoilleach #Shef2019 Shows application of rhodopsin to hB2AR model. You can see how H bonds are initially sampled, but only a few are realized. Later on, H bonds that are found are explicitly restricted.
                                                --> baoilleach #Shef2019 Compares the count of HBs in the new model vs MODELLER. Moving onto case study 2...again finds more H bonds in the model than MODELLER does on its own.
                                                    --> baoilleach #Shef2019 One of the most powerful features of our method is the expansion of conformational space.
                                                        --> baoilleach #Shef2019 The details. The input is a gigantic text file of res traints, which is generated programmatically. It's amenable to DL/ML. We do restraint optimisation - ant colony, GA or Pareto. If you modify one part of the molecule, it can modify other parts. Slow to evaluate...
                                                            --> baoilleach #Shef2019 Are searching for restraint motifs. Everything is parallelisable. Mentions the use of networkx, Python graph library with lots of methods.
                                                                --> baoilleach #Shef2019 Publication in preparation.

WendyAnneWarr #shef2019 Next up Evgeni Grazhdankin of uni Helsinki. Homology  modelling with probabilistic restraint graphs

WendyAnneWarr #shef2019 MODELLER is useful and fast but does not create long range restraints and misses many credible H bonds
    --> DavidMa26610192 @WendyAnneWarr That cool

RSC_CICAG The last speaker of the graph algorithms session is John Mayfield (@jwmay) from NextMove Software talking about the secrets of fast SMARTS matching. #Shef2019
    --> RSC_CICAG Audience warned that the talk might get quite technical in places, but reassures us that we will all be experts on substructure matching by the end! Timer set for twenty-five minutes! #Shef2019
        --> RSC_CICAG Refers to talk given in 2015: https://t.co/ioW9UY4L49 #Shef2019
            --> RSC_CICAG Rarey was at 42h26m fro 1,172mio compounds on PAINS filters to around 30s with optimisation. Rarey paper here: https://t.co/5sUriyNon7 #Shef2019
                --> RSC_CICAG PAINS filters run so fast that the job is finished before @pwk2013 even realises you've started it! ;-) #Shef2019
                    --> RSC_CICAG Good benchmarking paper on substructure searching from Rarey et al.: Systematic benchmark of substructure search in molecular graphs - From Ullmann to VF2 https://t.co/guWprXMECN #Shef2019
                        --> RSC_CICAG Ray and Kirsch algorithm, published in @sciencemagazine in 1957: Finding Chemical Records by Digital Computers https://t.co/KOLl56iX9K #Shef2019
                            --> RSC_CICAG Another great article on theory and algorithms of SMARTS: Comparing Molecular Patterns Using the Example of SMARTS: Theory and Algorithms
 https://t.co/OQDPHboB9r #Shef2019
                                --> RSC_CICAG Clever talk on optimising SMARTS using some terrific optimisations with code! #Shef2019 https://t.co/o2F9ua6Q8a

                    --> pwk2013 @RSC_CICAG Is that filters for pan-assay interference or filters for frequent-hitter behavior when assayed at 30-50 μM by #AlphaScreen? Would it be uncouth to say #SOSOF (same old shit only faster)? #PAINS #Shef2019 #cheminformatics #MedChem #DrugDesign

baoilleach #Shef2019 John Mayfield on the secrets of fast smarts matching
    --> baoilleach #Shef2019 We care mostly about SMARTS in the context of subgraph matching, but it's a general problem. SMARTS are quite tricky - fingerprint screening is often not effective for queries used in practice, so rely on atom-atom matching speed for efficiency.
        --> baoilleach #Shef2019 Shows results of substructure benchmark results on Andrew Dalke's benchmark set. 90% finish in one second on eMolecules, in general. But it's the long tail that kills.
            --> baoilleach #Shef2019 Example of PAINS filters. About 600 patterns. Reported as taking 42h in one case. Can be shortened down to 50s with the right approach.
                --> baoilleach #Shef2019 The favourite method is VF2 (2001) - linear memory. VF2 Plus (2015) fixes a major flawing in the original. VF2++ (2018) had additional pruning.
                    --> baoilleach #Shef2019 Some code shown. The key idea is VF2 is atom-based, iterates over each atom in the query. At each pt select a pair of atoms (one from query, one from ref) - are these feasible? - do the atoms match? Key innovation was two terminal sets.
                        --> baoilleach #Shef2019 When doing the match, the degree of molAtom must be <= the degree of the query atom. This can help with pruning.
                            --> baoilleach #Shef2019 The degree bound check always improves the performance. And now VF2 without pruning runs faster than VF2 with pruning. For chemical graphs, this makes sense. Also described in a BMC Bioinf article.
                                --> baoilleach #Shef2019 The RK algorithm (Ray and Kirsch Science 1957), the USPTO was searching for chemicals. It's a backtracking algo over bonds.  Three possibilies, none are mapped, one end is mapped, both ends are mapped. If both are mapped, this is close ring.
                                    --> baoilleach #Shef2019 More efficient than VF2.  For a given query we convert to a virtual machine op-codes to be more efficient, and have extra checks, e.g. SAMEPART/DIFFPART, or RXNROLE or TETRA_LEFT/RIGHT. More efficient to do this check during the match rather than at the end.
                                        --> baoilleach #Shef2019 Shows performance on the queries. RF 3 to 4 times faster than VF2 and VF2+.
                                            --> baoilleach #Shef2019 Best-first ordering versus depth-first. For CCCCBr start at the Br first. To do this, you need probabilities of different atoms and bonds. E.g. 8% of bonds are double bonds. 1% of atoms are S. Local probabilities versus global probabilities.
                                                --> baoilleach #Shef2019 Choosing a good seed atom gets you most of the way. E.g. start at heteroatom, or if not present, start at smallest degree carbon.
                                                    --> baoilleach #Shef2019 Smallest ring size (e.g. [r6]) is not invariant in a subgraph, but ring size is - Daylight were planning to use [Z<n>] for this. You spend most of the time on things that don't match - the speed on things that match is not so important - it's the backtracking that kills
                                                        --> baoilleach #Shef2019 Arthor database file format. Most molecules are boring; the majority of molecules can be described by a small no of local atom environment. We use this by finding the freqs of atom types, and sorting atom types based on this.
                                                            --> baoilleach #Shef2019 For molecules that consist of only the 255 common atom types, small records can be used (1 bytes for atom type). For others, two bytes per atom. For ChEMBL, 99.44% of atoms have common atom types. For Enamine REAL, it's 100%.
                                                                --> baoilleach #Shef2019 Co-design of ATDB file-format for optimal matching, compression and minimise memory allocation.
                                                                    --> baoilleach #Shef2019 Interesting question on whether a VM-based approach could be used to quickly explore chemical space.
                                                                        --> rguha @baoilleach VM as in virtual machine? Can you expand? #Shef2019
                                                                            --> baoilleach @rguha VM as in virtual machine. Expanded! (Ok, maybe more later...)
                                                                                --> baoilleach @rguha That question was from @ThierryHanser

WendyAnneWarr #shef2019 last paper of the three on graph algorithms. John Mayfield of @nmsoftware . Secrets of fast SMARTS matching
    --> WendyAnneWarr Best if I leave Noel of @nmsoftware to tweet his colleague’s talk 😀

WendyAnneWarr #shef2019 Mayfield cites rarey @JCIM_ACS 2015 on slow processing of SMARTS for PAINS

WendyAnneWarr #shef2019 and Ehrlich and Rarey from Ullmann to VF2 @JCIM_ACS J Cheminf 2012 4 13

RSC_CICAG First speaker of the session on #QSAR is Kunal Roy from Jadavpur University on a new workflow for QSAR model development from small data sets: Integration of data curation, double cross-validation and consensus prediction tools. Abstract here: https://t.co/vo8uHUAnCY #Shef2019
    --> RSC_CICAG Software is available here: https://t.co/vKjNHTTSJU #Shef2019
        --> RSC_CICAG Interesting talk on making better use of our data for #QSAR predictive modelling. #Shef2019 https://t.co/HHl64GhXuP

baoilleach #Shef2019 Kunal Roy on A new workflow for QSAR model development from small data sets
    --> baoilleach #Shef2019 Ref to a primer on QSAR/QSPR Fundamental concepts by the author from Springer
        --> baoilleach #Shef2019 Shows pipeline describing guidelines for QSAR development, with references to OECD principles on validation, training/test set division.
            --> baoilleach #Shef2019 V. important to know the reliability of prediction of a model, external/internal validation. External validation is considered the gold standard for evaluation. R2-based metrics, error-based measures (RMSE, MAE, PRESS).
                --> baoilleach #Shef2019 When using small datasets, a significant amount of info is lost due to the held out samples. Also, there may be a bias due to the small training set. Outliers will have a large effect. Thus dataset curatoin *prior* to model development is v important.
                    --> baoilleach #Shef2019 Some of these problems can be overcome through the method of double cross-validation (DCV). Inner loop, divide into n calibration and validation sets, while the the outer loop is explicitly used for test set selection.
                        --> baoilleach ah#Shef2019 Also problems with the limited no of descriptors in the final models esp for PLS or MLR. Perhaps should use a consensus of multiple models. Shows example of several models built by considering only 'qualified' cmpds (with nbrs), another with werighted average...
                            --> baoilleach #Shef2019 Software tools at https://t.co/vNpV9l4p4v for small dataset curator and modeller.
                                --> baoilleach #Shef2019 The curator tool enables duplicate analysis (descriptor-based), removal of structural and response outliers, and activity cliff analysis (using t-test) giving a final ready-to-use dataset. For near duplicates in descriptors, if response different, both should be removed
                                    --> baoilleach #Shef2019 Describes identification of activity cliffs. The details depend on the number of structural nbrs. For 1 nbr, do the responses differ by >1 log unit?
                                        --> baoilleach #Shef2019 The small dataset modeller implements the pipelines described earlier. Now showing case studies. Shows that his approach has lower MAE compared to the classic approach. After curation of a response outlier and activity cliff removal, MAE lower again.
                                            --> baoilleach #Shef2019 Question about size of datasets. Answer: Up to 150 molecules. [I think] Question about what to do with activity cliff detection? Answer: The user determines what to do.

WendyAnneWarr #shef2019 now the QSAR papers. First Kunal Roy. New workflow for QSAR model development from small datasets: integration of data curation, double cross-validation , and consensus prediction tools

WendyAnneWarr #shef2019 Roy’s approach does not split the dataset

WendyAnneWarr #shef2019 in data curation step Roy identifies structural and response range outliers

WendyAnneWarr #shef2019 Roy also identifies activity cliffs

WendyAnneWarr #shef2019 For double X-validation Roy carries out LMO in different iterations and selects model.

baoilleach #Shef2019 https://t.co/RCvdKe6X6i

WendyAnneWarr #shef2019 https://t.co/jySiYyCakW Roy’s workflow freely available

WendyAnneWarr #shef2019 This is Roy’s QSAR primer https://t.co/f6viOckEEX

WendyAnneWarr Forgot the hashtag on this one about MGMS AGM #shef2019 https://t.co/vPxOSz8E4b

RSC_CICAG Second speaker on #QSAR is Martin Packer from AstraZeneca on Free energy perturbation and Free-Wilson models compared and contrasted. #Shef2019
    --> RSC_CICAG Prediction of binding affinity - a hard problem! #Shef2019
        --> RSC_CICAG Abstract here: https://t.co/O9D9pUO5mu #Shef2019
            --> RSC_CICAG Martin is comparing free energy perturbation (computationally expensive) and Free-Wilson analysis (computationally inexpensive) #Shef2019
                --> RSC_CICAG Original paper on Free-Wilson Analysis: A Mathematical Contribution to Structure-Activity Studies https://t.co/UxRFMU2yjY #Shef2019
                    --> RSC_CICAG The indicator variable model is completely confined to what it has seen already. Therefore, not good at extrapolating beyond R-groups already observed. #Shef2019
                        --> RSC_CICAG Free-Wilson is a very simple and intuitive mode - but relies on plentiful experimental data #Shef2019
                            --> RSC_CICAG FEP is a really complex model, but in principle requires a single protein-ligand structure and very limited experimental benchmark data. #Shef2019
                                --> RSC_CICAG What if you just did docking to rank affinity using Glide and MM-GBSA? Could do a reasonable job but no idea where to draw the line to make good compounds nor what IC50s would emerge from the ranking. #Shef2019
                                    --> RSC_CICAG Free and Wilson didn't claim their analysis is not a modelling tool because there is no 3D involved, just shows you what you have. #Shef2019
                                        --> RSC_CICAG Great talk on FEP and Free-Wilson Analysis. #Shef2019 https://t.co/aDhVfke3QC

baoilleach #Shef2019 Martin Packer (AZ) on Comparison of Free E perturbation and Free-Wilson.
    --> baoilleach #Shef2019 Prediction of binding affinity is a hard problem. How to balance the computational effort you put in against the amount of data you need to come up with valid models. Want the minimal amount of effort in total, whether exptal or computational.
        --> baoilleach #Shef2019 FEP requires high computational effort, but doesn't need much exptal data required. But Free-Wilson requires little computational effort but a lot of data. How to balance?
            --> baoilleach #Shef2019 Explanation of FEP. Zwanzig equation (1954). We can qualify any alchemical change. Perturbations must be conservative if we expect accuracy. For example, if it might change binding mode, you should worry that it won't work.
                --> baoilleach #Shef2019 FEP in practice involves deltaG of ligand change in solution, then ligand change in protein. That's easy enough. But we need to know a binding energy of one of the ligands in order to complete the cycle for other ligands.
                    --> baoilleach #Shef2019 Free-Wilson (JMC 1964). R groups as indicator variables. Calculate group contributors of R groups to predict pIC50 for an unknown. The more data I have, the more likely I am to use it. Indicator variables are either 1 or 0; not particularly good at extrapolating beyond
                        --> baoilleach #Shef2019 ..the R groups you've seen. But this is a free of FEP. One thing you can do, is use similarity to assign a set of fractional indicator variables for a new R group. Greatly expands the range.
                            --> baoilleach #Shef2019 Why compare these? Both are perturbation models. We know one thing, and we want something else. Can explore the return on computational effort. Free-Wilson simple and intuitive, relies on plentiful exptal data. FEP is cmplx, needs P-L structure
                                --> baoilleach #Shef2019 Taking model system of 11beta-HSD1. Notes in passing that symmetry causes problems for Free-Wilson. For FEP, simulated in fully solvated PL system with 14.6K water molecules. Routine sim time is 5ns.
                                    --> baoilleach #Shef2019 Shows FEP predictions. Quite a few molecules where the answers are within the error bar. Also a few lwhere there appears to a systematic error. For Free-Wilson, leave-1-out model used, [What's the R2?]
                                        --> baoilleach #Shef2019 Looking at Free-Wilson enumeration. Lots of FW predictions off the line. I expected FW to overestimate potency when they wouldn't and that FEP would correct that. Actually it's the other way around. FW is underestimating. And when FEP makes an error, it tends to over.
                                            --> baoilleach #Shef2019 One example explained by a different binding mode found in docking results.
                                                --> baoilleach #Shef2019 What if you just did docking? You get sensible ranking but no idea of the IC50s.
                                                    --> baoilleach #Shef2019 Now using multiple internal projects where FEP results are available. Can't show structures. Showing predictions [but again no RMSE or R2]. Ref to "non-parametric density" in the scatter plots.
                                                        --> baoilleach #Shef2019 In the original paper, they didn't claim that FW is a modelling tool as it doesn't incorporate 3D.  But there's a synergy between these approaches if you use generalised-FW.
                                                        --> macinchem @baoilleach Not sure of value if structures are not shown.
                                                    --> dgelemi @baoilleach Was it as good as FEP at ranking ?
                                                        --> baoilleach @dgelemi Good question. Maybe. Like I said, no RMSEs or R2 were shown. Deliberately he said at the end, but it means that I don't know the answer.

WendyAnneWarr #shef2019 Now Martin Packer compares FEP and Free-Wilson models. Prediction of binding is a hard problem. Optimise comp effort against data required for valid models

WendyAnneWarr #shef2019 Packer’s slides will be put on the web soon

WendyAnneWarr #shef2019 Packer explains alchemical change and Zwanzig equation

WendyAnneWarr #shef2019 Packer. But perturbations must be conservative

WendyAnneWarr #shef2019 in the delta G cycle perturbing the ligand and protein is much easier than the two delta G binding calculations

WendyAnneWarr #shef2019 why compare FEP And FW? They are both perturbation models. FW practical intuitive but needs lots of data. FEP complex but needs fewer data

RSC_CICAG Great talk from @jwmay at #Shef2019. https://t.co/QC7F22HW3M

WendyAnneWarr #shef2019 Packer CSD study 11-beta-HSD1. PDB 3OQ1

WendyAnneWarr #shef2019 Packer shows FW R - group matrix . FEP simulates a fully solvated protein ligand system.

WendyAnneWarr #shef2019 experimental versus predicted for the two methods shown. Both pretty good around the straight line. In practice there are often few days. So enumerate R-groups and compare for the larger dataset

WendyAnneWarr #shef2019 Packer significant number of agreement between two methods . He thought the FE would overpredict but actually FEP overestimates binding

WendyAnneWarr Correction : few DATA #shef2019 https://t.co/hcoEx3Thc2

WendyAnneWarr #shef2019 Packer. Maybe the binding mode is at the root of this.

WendyAnneWarr #shef2019 Packer. What if you just did docking? Glide score & Mm-GBSA gG bind shown. sensible ranking but no idea of actual binding affinity

WendyAnneWarr #shef2019 now did the FW model using only the smaller amount of info available to FEP. for multiple projects

WendyAnneWarr #shef2019 Packer FEP vs exptl . Looks like broad spread around the line but actually most points are close to line. Ditto for FW. Trend not unlike his first one protein attempt.

WendyAnneWarr #shef2019 SAR insight available where models fail to agree.

WendyAnneWarr #shef2019 Packer used Matlab JMP OEChem @OpenEyeSoftware and @Schrodinger FEP+ and Desmond

RSC_CICAG Last speaker in the #QSAR session is Sébastien Guesné from Lhasa Limited on Conformal calibration of probabilistic predictions. Abstract is here: https://t.co/PO7mfOVyGT #Shef2019
    --> RSC_CICAG First part on Decision Domain: a more formal definition of applicability domain. Paper here: https://t.co/QewqcyK2GG #Shef2019
        --> RSC_CICAG Assessment of Probability Estimates: output is estimate of class membership probability distribution for each point -> colour a histogram of positive and negative class predictions then bin into five bins. Gives you the accuracy of prediction in each bin. #Shef2019
            --> RSC_CICAG Lack of calibration due to imbalanced datasets. Need to calibrate the class probabilities. #Shef2019
                --> RSC_CICAG Lhasa Limited's Decision Domain: applicability, reliability and decidability domaines. #Shef2019 https://t.co/PwiFOxMw2V

                    --> RSC_CICAG Great talk on Conformal Predictions from Lhasa! #Shef2019 https://t.co/PTeTmIsCBG

baoilleach #Shef2019 Sébastien Guesné (Lhasa) on Conformal calibration of probabilistic predictions.
    --> baoilleach #Shef2019 Can we use conformal predictions to calibrate the preds from a binary classifier.
        --> baoilleach #Shef2019 Why interested? We want to define the decision domain. This is a more formal defn of the applicability domain. (Several publications from @ThierryHanser on the topic).
            --> baoilleach @ThierryHanser #Shef2019 The decidability score is the difference between the two probabilities, the greater it is the more confident we are in the prediction. Positive at 75% of prob, -ve at 25%, decision score is 50%.
                --> baoilleach @ThierryHanser #Shef2019 Shows how to calibrate the class probabilities using a threshold. A ref to a weather forecasting paper on reliability estimates.
                    --> baoilleach @ThierryHanser #Shef2019 Decision domain: Applicability, reliability, decidability.
                        --> baoilleach @ThierryHanser #Shef2019 Conformal prediction (CP). A conformal classifier is a wrapper around a classifier. Explains CP [I still need to learn about this...]. Gives example of binary classifier +ve/-ve. Need only a single class in the predicted region; if both or none, then not informative.
                            --> baoilleach @ThierryHanser #Shef2019 Another way to look at it is that there's a "both" region (+ve and -ve), then a "unique class region" (+ve), then a "null" region along the significance axis.
                                --> baoilleach @ThierryHanser #Shef2019 Shows a conformal classifier in practice. We want to transform the prediction region back to probabilties; we need to calibrate this. All pValues normalised to largest. Null values are divided by the no of classes.
                                    --> baoilleach @ThierryHanser #Shef2019 Shows the result for a training/test set. AUC is 77%. MCC is 0.4. Balanced accuracy is 70%. This is an increase from MCC of 0.25 and accuracy of 57%.

WendyAnneWarr #shef2019 Sebastian Guesne of Lhasa on conformational calibration of probabilistic predictions

WendyAnneWarr #shef2019 Guesne’s first topic is formal definition of applicability domain (published in SAR QSAR in... I missed the ref)

WendyAnneWarr #shef2019 guesne uses a decidability score. Weather and forecasting 2007 22 651 reports work on reliability algorithms. Guesne: importance of calibrating the class probabilities.

WendyAnneWarr #shef2019 the Guesne citation was SAR QSAR Environ Res 2016 27 893

WendyAnneWarr #shef2019 two Norinder publications on conformal classifier. One in SAR QSAR Environ res 2016. One in JCAMD

adlvdl @RSC_CICAG I'm really liking the GIFs you are posting for the talks at #Shef2019. Next time anyone asks me what is Chemoinformatics I will just show them a reel of the GIFS 😆

WendyAnneWarr #shef2019  No tweets recently because I was trying out use of @SciFinder on my phone to get that Ulf Norinder ref I missed tweeting. Takes a while to get used to @SciFinder on phone but I did find all Norinder refs eventually. There seem to be some 2019 ones inc @JCIM_ACS  one

Dr_GHill One thing is for certain, chem(o)informaticians are the most productive live conference tweeters I have ever seen. Check out #shef2019 if you need proof.
    --> ChristofJaeger @Dr_GHill Chemo! I am with team Chemoinformatics ;)
        --> Dr_GHill @ChristofJaeger It’s definitely with an “o” in Sheffield.
    --> adlvdl @Dr_GHill Very true!

RSC_CICAG Sheffield Cheminformatics meeting as brilliant as ever. #Shef2019

baoilleach #Shef2019 Lindsey Burggraaff on predicting polypharmacology in kinases
    --> baoilleach #Shef2019 Most medicines are single-target drugs. But in oncology, resistance occurs for such drugs and so need combination therapy, but can lead to drug-drug interactions and toxicity. Solution: a multi-drug drug, polypharmacological drug.
        --> baoilleach #Shef2019 Describes the DREAM challenge, a non-profit open science challenge. One challenge was for cancer thepaeutics. Kinases: 7 on-targets (wanted) and 2 off targets (unwanted). Cmpds should be novel and patentable.
            --> baoilleach #Shef2019 Describes workflow. What data do we have available? Cleaned up data from ChEMBL. Remove large structures, duplicates and pChEMBL must be >= 6.5. For off targets, no of inactives quite low. Have X-ray structures for all but 2.
                --> baoilleach #Shef2019 Use one statistical model as a filtering step. A second one is used for ranking cmpds. The first is a simple QSAR for the main kinase RET kinase based on cmpd descriptors.
                    --> baoilleach #Shef2019 For scoring, wanted it more accurate. Use proteochemometrics (PCM) model built on 371 kinases (643K data points) - includes cmpd descriptors and protein descriptors.
                        --> baoilleach #Shef2019 Optimised parameters with scikit-learn (random d search) with fixed n of trees (300). Also had access to Janssen Biosignature, which is a logistic model trained on different features. Only included these models if ROC > 0.8.
                            --> baoilleach #Shef2019 In praise of sandwiches - "it tastes better together!"
                                --> baoilleach #Shef2019 Now structure-based docking. Structures not great, so used induced fit docking. Also used interaction fps to score. Benchmarked on 328 structures. Used SPLIFs to score - circular interaction fingerprints.
                                    --> baoilleach #Shef2019 Combined docking and SPLIF scores using a Z2-ensemble. [Ed: not sure what this is?]
                                        --> baoilleach #Shef2019 Metadynamics was the last step. V time-consuming. Took the top 100 docked cmpds. Take the docked poses and try to push it out of the binding site - if it stays longer, then it has higher binding affinity. Improved results over docking alone.
                                            --> baoilleach #Shef2019 Shows examples of ligand that stays in binding pocket versus moves out during the metadynamics.
                                                --> baoilleach #Shef2019 Combined the scores from various methods. Visual inspected the top 200 and selected one for exptal validation. Could only select one for this competition [I think?]. Result was inactive on all 9 kinases. No points for on-targets but good for off-targets.
                                                    --> baoilleach #Shef2019 Some teams did find cmpds with the desired profile, but were not novel. Conclusions? It's a true challenge. Ordered 50 more cmpds. Work in progress, these cmpds have not yet arrived.
                                        --> LenselinkBart @baoilleach From this paper:

RSC_CICAG First speaker in this afternoon's session is Lindsey Burggraaff from Leiden University on Case Studies is Predicting polypharmacology in kinases. https://t.co/xUSHQ8kphc #Shef2019
    --> RSC_CICAG Looking at nine kinases of main on/off-kinases: RET And MKNK1, and additional on/off-kinases BRAF, SRC, S6K/RPS6KB1, TTK, Erk8/MAPK15, PDK1, PAK3. #Shef2019
        --> RSC_CICAG The modelling workflow is compared to a sandwich of different methods: the models are better together. #Shef2019
            --> RSC_CICAG Docking alone give AUC of 0.88, combined with Metadynamics brings it up to 0.92. #Shef2019
                --> RSC_CICAG Prediction of polypharmacology in kinases is a big challenge. Combining activity and novelty makes it even more challenging. Testing more compounds... #Shef2019
                    --> RSC_CICAG What have we learned?
 * Methods are data dependent.
 * Structure-based model performances vary greatly between crystal structures.
 * Z2-scoring can have a significant impact on performance.
 * Metadynamics can be used to enhance pose scoring.
                        --> RSC_CICAG Great talk on model 'sandwiches' with a predictive filling! #Shef2019 https://t.co/XAQgs75y7y

WendyAnneWarr #shef2019 after lunch a set of case studies. Lindsey Burggraaff of Leiden on predicting pharmacology in kinases. A polypharm drug could avoid. Resistance in cancer treatment.

BZdrazil What a very nice and sweet way of selling your product #cheminformatics #Shef2019 https://t.co/jG0Q8QGWou

WendyAnneWarr #shef2019 Lindsey took Data from ChEMBL MW <700. Activity threshold pChEMBL 6.5.

WendyAnneWarr #shef2019 Lindsey used two stat models one for filtering (RET) next for scoring (PCM)

WendyAnneWarr #shef2019

WendyAnneWarr #shef2019 Lindsey used PCM from scikit-learn. Janssen biosignatures (from github) also used

WendyAnneWarr #shef2019 stat models , docking and MD all feed into step 7 of the workflow.

DrJoshuaBox "I was happy to see sandwiches at lunch because they look like my model". Bravo #Shef2019 https://t.co/IH8Ra9kPaU

WendyAnneWarr #shef2019 Lindsey rescores using metadynamics

WendyAnneWarr #shef2019 Lindsey’s work is part of multi-targeting drug community challenge (DREAM) . Cell chem. biol. 2017 24 1434

WendyAnneWarr #shef2019 Zinc12493340 did well for off-targets but alas no good for on- targets. Better results for other mols but they were not novel or patentable.
    --> baoilleach @WendyAnneWarr #Shef2019 Methods are data dependent. Great variation between xtal structures. Z2-scoring can have significant impact. Metadynamics improved results.

WendyAnneWarr #shef2019 methods are data dependent. Methods differ between Xal structures. Z2 scoring can gave significant effect on performance. Metadynamics worth considering even if compute-intensive

WendyAnneWarr #shef2019 metadynamics JCTC 2016 12 2990
    --> RSC_CICAG @WendyAnneWarr Prediction of Protein–Ligand Binding Poses via a Combination of Induced Fit Docking and Metadynamics Simulations https://t.co/d3ZEgLbQIU #Shef2019

RSC_CICAG Next speaker on case studies is the indomitable Bob Clark (@DrBobClark1) from Simulations Plus on Validating property and metabolite predictions for some novel antimalarial compounds. #Shef2019
    --> RSC_CICAG Abstract here: https://t.co/5LzCMYRWuR #Shef2019
        --> RSC_CICAG Bob's take on Machine Learning and stirring the pile of data until they start "being right"! https://t.co/jaflfOym2D #Shef2019 via @xkcdComic https://t.co/WQCSgjv5Pf

            --> RSC_CICAG Looking at stability early on in drug discovery lead optimisation programmes - a prospective study. #Shef2019
                --> RSC_CICAG Using simplified metabolism maps to predict sites of metabolism and report likely yields. #Shef2019
                    --> RSC_CICAG Conclusions
 * Metabolite maps can get very complicated very quickly
 * Potential for autoinhibition can complicate things
 * Value of models indicating thing might be more complex
 * Unusual things are difficult
                        --> RSC_CICAG Great talk on metabolism from Bob Clark (@DrBobClark1)!!! #Shef2019 https://t.co/NMysuMWVJg

baoilleach #Shef2019 https://t.co/9qXd5KmJMr

baoilleach #Shef2019 Bob Clark on Validating property and metabolite predictions for some novel antimalarial compounds
    --> baoilleach #Shef2019 https://t.co/SKvaKuMq6W on machine learning and stirring the pile until the results *are* right
        --> baoilleach #Shef2019 Going to describe a prospective validation drug design project that started back in 2011. Made molecules that we predicted would be better based on phenotypic screen related to malaria.
            --> baoilleach #Shef2019 Our cmpds worked well on DHODH from P. falciparum. Shows the structures. The ones we thought would be most active were the least active and hardest to make.
                --> baoilleach #Shef2019 Comparing predicted vs actual microsomal degration, i.e. rates of CYP metabolism. Different models for different Cyps, so first step is to predict which model applies. A lot of Cyps show autoinhibition if you go above the Km. We were testing several fold above Km.
                    --> baoilleach #Shef2019 We went on to look at the identity of the metabolites. Used mass spec. Changes in the fragmentation pattern let you know where exactly the oxidation occurred. Shows the predicted model hot spots for metabolism.
                        --> baoilleach #Shef2019 Shows a metabolite map for one cmpd. The various possible products of metabolism depending on which Cyp. The method gives the %s. It's a testable hypothesis. Compares to actual HPLC-MS results. The CRO charged them by the ion to look for, after the first three.
                            --> baoilleach #Shef2019 Moving onto other examples. Compares observed versus predicted yield. First example, seems reasonable. Next example, predictions not right; must be that 2D6 *is* important in this case.
                                --> baoilleach #Shef2019 Some parent ions fragment too quickly to show up in MS. One of the values of models is to tell you when things are more complicated than one might expect.

WendyAnneWarr #shef2019 Prediction of Protein–Ligand Binding Poses via a Combination of Induced Fit Docking and Metadynamics Simulations | Journal of Chemical Theory and Computation https://t.co/KNUw0l6XKB

WendyAnneWarr #shef2019 @SimulationsPlus Bob Clark used GSK antimalarial actives in a @SimulationsPlus study that started back in 2011

WendyAnneWarr #shef2019 Bob presents table of rates of CYP metabolism in vitro @SimulationsPlus. in some cases autoinhibition expected.

WendyAnneWarr #shef2019 @SimulationsPlus predictions and confidence scores. Now Bob shows structure and mass spectrum for a metabolite of SLP-0006. It fragments nicely at same place across the series. You can then see where oxidation occurs

WendyAnneWarr #shef2019 Bob shows predicted hotspots and metabolite map for SLP-0006 and points out tox risks @SimulationsPlus

suneel_bvs #creativity  #Shef2019 #Computational_chemistry https://t.co/DWQuUKU7yX

WendyAnneWarr #shef2019 Bob now shows HPLC MS data for same compound. Hard to report this without the chemical structures!

WendyAnneWarr #shef2019 Bob presents results for SLP-0004 and 0005. Predicted metabolic hotspots pointed out.

WendyAnneWarr #shef2019... and for 0003 @SimulationsPlus . Bob lists considerations and caveats. Metabolite maps can get complicated very quickly. Autoinhibition can complicate things.! Models pretty good though overall

WendyAnneWarr #shef2019 ADMET predictor 9.5 models mostly using neural nets. @SimulationsPlus

baoilleach #Shef2019 Chris de Graaf (Heptares) on Structural chemoinformatics tools for GPCR SBDD
    --> baoilleach #Shef2019 GPCR's v important for drug discovery. Field still booming. More+more structures becoming available. GPCR binding mode diversity - can bind all over the place. But how to exploit this growing information?
        --> baoilleach #Shef2019 StaR approach to stabilise structure via point mutations distant from binding site.  Then xray xtallography.  Why SBDD important to improve drug quality? An atom-by-atom optimisation. Ligand efficentcy (EL). Control lipophilicity also.
            --> baoilleach #Shef2019 To illustrate usage of structures, shows A2AR antagonist. Did VS on homology model, and shows how structure was optimised to target lipophilic hotspot. In clinical phase in partnership with AZ.
                --> baoilleach #Shef2019 Ref to Rembrant "De Samenzwering van Claudius Civilis", one of his last paintings. A masterpiece. The purchaser didn't want it.
                    --> baoilleach #Shef2019 Used Knime to connect nodes for structure analysis. You should check it out for your use [missed link]. Mapping the bioactive chemical space of the structural GPCRome.
                        --> baoilleach #Shef2019 Compare all these known ligand not only to structures in the PDB but to those in house and the StaRs in house. This doubles the chemical covered. Can also take the ligand's view. Iterative 2D chemical similarity for scaffold hopping.
                            --> baoilleach #Shef2019 Sub-family specific two-entropy analysis (ssTEA). Low entropy within a subfamily. Now talking about integrating mutation data. Describes a 3D view of mutation effects on GPCR ligand binding. When mapped onto binding site, they often make sense - not always, indirect.
                                --> baoilleach #Shef2019 Back to A2AR. Initially we used a homology model. Several competitions were run on this. Continuing surprises though. Binding between helices in the membrane, for example. Not also possible to address these binding sites, but it opens up the possibility.
                                    --> baoilleach #Shef2019 Peptides can also be used as ligands. We have compared all these binding sites of peptides, and shown that all these lipophilic pockets are in the same spots..
                                        --> baoilleach #Shef2019 Mention of PeptiDream. Used for successful hit generation. Used to address allosteric binding site. Are these sites druggable? Did site identification by ligand competitive saturation (SILCS). Tells us that they are relatively druggable. How to prioritise, and target?
                                            --> baoilleach #Shef2019 All these bindings have always been present, we've just been ignoring them. Go from 3D to 2D, just like Piet de Mondriaan.
                                                --> baoilleach #Shef2019 Development of structural interaction FPs for GPCRs for GPBRBench-IFP. Performs better than PLANTS. For Glide it often performs better but not always. Combining IFPs with energy-based scores, then it worked best. The orthogonal VS approach gave complemenary cmpd series.
                                                    --> baoilleach #Shef2019 Protein binding sites are far more complex than shape. WaterFLAP used in house to identify "unhappy" waters which can be displaced by lipophilic moieties or stabilised by hydrophobic.
                                                        --> baoilleach #Shef2019 The water network is an important driver of binding kinetics. Starting to look at how the ligand can stabilise these networks. Can they be used in QSAR models to rank hits?
                                                            --> baoilleach #Shef2019 Orexin receptor structures. Example of where displacement of "unhappy" water molecules is a driver of selectivity. Manuscript in prep on orexin structures...
                                                                --> baoilleach #Shef2019 Now FEP. Out of the box, does not work for GPCRs. In AR, the water network is really key, especially for ranking. A second case, looking at sampling of conformations with FEP. Final case, where FEP has found more potent molecules where synthesis was difficult.
                                                                    --> baoilleach #Shef2019 Poster on combining MMPA with retrosynthesis and RNN...
                                                                    --> baoilleach #Shef2019 Question: what's the next big thing in GPCR discovery? Answer: CryoEM. Can help find less potent ligands. Can identify additional binding sites.
                                                                        --> ydu_sci @baoilleach Can find more GPCR conformations.
                        --> Chris_de_Graaf @baoilleach https://t.co/TR0BAkUHix

RSC_CICAG Last talk of the session is from Chris de Graaf (@Chris_de_Graaf) from Sosei Heptares on Structural chemoinformatics tools for GPCR structure-based drug design. #Shef2019
    --> RSC_CICAG Exciting new era of GPCR Structure-Based Drug Design. 62 structures in the PDB, 211 unique GPCR-ligand complexes. Sosei Heptares has > 260 structures with 30 in the @PDBeurope. #Shef2019
        --> RSC_CICAG The Masterpiece from Rembrandt - a lot of the pieces were missing: https://t.co/ii11NjP4Sp #Shef2019 https://t.co/TAG6KVQWe5

            --> RSC_CICAG The 3D-e-Chem code: https://t.co/DHRERliLtb #Shef2019 https://t.co/ZHxYjIRkLG

                --> RSC_CICAG Structural Chemoinformatics: binding site view on GPCR ligands: 3D‐e‐Chem: Structural Cheminformatics Workflows for Computer‐Aided Drug Discovery https://t.co/ZLD6I1bmO2 #Shef2019
                    --> RSC_CICAG Aminergic GPCR–Ligand Interactions: A Chemical and Structural Map of Receptor Mutation Data. https://t.co/jdyMtt1ui3 #Shef2019
                        --> RSC_CICAG Great talk on binding site analysis, specifically on GPCRs. #Shef2019 https://t.co/1pvWJNu9Tf

WendyAnneWarr #shef2019 Chris de Graaf of Sosei Heptares. Structural Cheminf tools for GPCR SBDD

WendyAnneWarr #shef2019 Heptares uses StaR stabilised receptor method. https://t.co/aETDa5H25h

WendyAnneWarr #shef2019 de Graaf. There is still a lot of GPCR discovery that can be done. De Graaf TIPS 2018 39 494

WendyAnneWarr #shef2019 de Graaf 3D-e-chem github

WendyAnneWarr So it had to be cut down in size . Known GPCRs are also only a small part of the story #shef2019 https://t.co/75tmKnoMn8

WendyAnneWarr #shef2019 https://t.co/PrZHMkO5xt https://t.co/bVQsOEhhAR

WendyAnneWarr #shef2019 conor scully has poster on retro synthetic analysis (chem eur j 2017 23 5966) and molecular pairs at Heptares

WendyAnneWarr #shef2019 GPCRbench @JCIM_ACS 2016

WendyAnneWarr #shef2019 https://t.co/FxnsBvYg0y. GPCRbench @JCIM_ACS

WendyAnneWarr #shef2019 de Graaf on @Schrodinger WaterMap and Live Design

WendyAnneWarr #shef2019 de Graaf also presents @Schrodinger FEP+ example

WendyAnneWarr #shef2019 Ben Allen of e-therapeutics on network-driven drug discovery.

WendyAnneWarr I love this one! #shef2019 https://t.co/pGXqsxVPpR

WendyAnneWarr #shef2019 https://t.co/Zg3czeg41S process is driven by biology, mechanistic (not black box), using multiple tools and datasets (not just AI), validated. Ben Allen will focus on use of AI for data augmentation.

WendyAnneWarr #shef2019 see https://t.co/saUTrlUIYd. They have been using these tools for about 6 years.

WendyAnneWarr #shef2019 AI is not the core of e-therapeutics’ platform. AI used to augment sparse experimental data with predictions. Are trying to implement Intellegens’ https://t.co/3VrfBF4pVj technology @Optibrium

WendyAnneWarr #shef2019 unfortunately Nathan Brown’s many (and illustrated) tweets are not being picked up by the Sheffield hashtag😭 They may appear later. Meanwhile see them at @nathanbroon
    --> nathanbroon @WendyAnneWarr I’m actually tweeting from @RSC_CICAG
    --> egonwillighagen @WendyAnneWarr @nathanbroon the @RSC_CICAG tweets also don't show up in my #Shef2019 feed (tho they do in my personal feed)
        --> egonwillighagen @WendyAnneWarr @nathanbroon @RSC_CICAG didn't even see any #Shef2019 tweet from @nathanbroon at all
            --> RSC_CICAG @egonwillighagen @WendyAnneWarr @nathanbroon Nothing from @nathanbroon, he is busy tweeting from @RSC_CICAG
        --> adlvdl @egonwillighagen @WendyAnneWarr @nathanbroon @RSC_CICAG I am having the same problem which is annoying as I am enjoying his talk summaries as gifs quite a lot. For anyone interested at what is going on at #shef2019 it's well worth checking the @RSC_CICAG feed

WendyAnneWarr #shef2019 @nathanbroon Nathan’s tweets were on the hashtag yesterday https://t.co/AZoUQE88DS
    --> macinchem @WendyAnneWarr @nathanbroon Seems to me that a number of tweets I've seen from various people are not appearing when I search #shef2019
        --> WendyAnneWarr @macinchem @nathanbroon 😭

WendyAnneWarr #shef2019 Now Thierry Hanser of Lhasa. Privacy-preserving knowledge transfer from corporate data to federative models

baoilleach #shef2019 Thierry Hanser on privacy preserving knowledge transfer.
    --> baoilleach #Shef2019 We need data, and there's a lot locked into private data silos. If we could find a way to share this knowledge without leaking chemical structures. There have been many attempts to share data in this way. E.g. an EU project with distributed data sharing,
        --> baoilleach #Shef2019 Proposed solution: Cronos (patron of harvest...harvesting knowledge!). Started with a paper from Google, Papernot et al ICLR 2017. Teacher-student approach. Consensus model of models built on sensitive data used to label non-sensitive data. This is exposed to users.
            --> baoilleach #Shef2019 Distinction between private space and shared space. The non-sensitive labelled data can be shared back among the partners. The student is initially not as wise as the teachers, but will eventually become more knowledgeable than any individual teacher.
                --> baoilleach #Shef2019 The Cronos consortium involved 8 major pharma (and now 10). Needed a way to validate the hypothesis. Start with something fairly easy: hERG. Quite homogenous thanks to patch clamp assay. Use Preissner hERG dataset.
                    --> baoilleach #Shef2019 If the student is performing better than an average teacher, then we consider the expt successful, i.e. that we are able to transfer knowledge from private data. Used SOHN, Self-Organising Hypothesis Network where descriptors are extended Sybyl Atom pairs.
                        --> baoilleach #Shef2019 Used MCC to assess performance as data is biased towards false negatives. [I think]
                            --> baoilleach #Shef2019 For the knowledge transporter, we used 1M compounds from PubChem, diverse and tractable, and then tiling for homogeneity down to 350K. All the models used to label the data. Consolidated over models using weighted average, based on reliability of prediction.
                                --> baoilleach #Shef2019 Reliability can be seen as the distance of the query to the training data. Should we use all 350K? Ranked them based on agreement on reliability and predicted labels. Take the 11K in which we are most confident (11K most +ve, 11K most -ve). Perfectly balanced.
                                    --> baoilleach #Shef2019 Shows histogram of student performance compared to teachers on external test set. Not only was the student better than the average teacher, it outperformed all the teachers on the external test set. Great surprise, and opens the door to leveraging this private data.
                                        --> baoilleach #Shef2019 Detailed evaluation of student. [missed details] How many teachers were actually required to achieve this performance? Seemed to plateau after 6 teachers. What is the contribution of each? Did leave-one-teacher out CV. All the teachers contributed positively.
                                            --> baoilleach #Shef2019 For a single teacher, the student was usually worse except for two cases. One possible explanation is that the student gets a probability distribution rather than a binary classification. That is, the teacher has curated the data before passing it on.
                                                --> baoilleach #Shef2019 We can trade-off the size of the dataset for confidence, to improve the results further. The student model outperforms ECFP4 and RF models used on the Preissner benchmark.
                                                    --> baoilleach #Shef2019 Second round of collaboration now starting.  Publication in progress.
                                                        --> baoilleach #Shef2019 Nice work @ThierryHanser. Very clever.
                                                            --> dr_greg_landrum @baoilleach @ThierryHanser Yeah, this was super, super cool.
The simplicity of this solution to a very difficult problem is just great.
                                                                --> pschmidtke @dr_greg_landrum @baoilleach @ThierryHanser are slides available? https://t.co/P2F6guIwEK

    --> dgelemi @baoilleach Is it similar (or linked) to the new IMI project MELLODDY?  https://t.co/ytwQA4hL3R
        --> macinchem @dgelemi @baoilleach There is also a consortium using blockchain for something similar? https://t.co/oQ5BxDrmYy
            --> dgelemi @macinchem @baoilleach Yes, that’s MELLODDY 😄
            --> dgelemi @macinchem @baoilleach Blockchain and federate learning was presented by @OWKINscience when I attended a conference in Paris.

WendyAnneWarr #shef2019 you need lots of good data for AI but many data are locked in private silos. So Lhasa has tried Teacher-Student approach. Nicolas Papernot talk at international conference on learning representations (ICLR) 2017

WendyAnneWarr #shef2019 correction! Nathan is tweeting as @RSC_CICAG @macinchem @egonwillighagen https://t.co/Gfor6aePDW
    --> egonwillighagen @WendyAnneWarr @RSC_CICAG @macinchem ah :) okay, so those are showing up in my private feed, but not for #shef2019 

(I actually added #Shef2019 to #shef2019 this morning, but that didn't have an effect)
        --> RSC_CICAG @egonwillighagen @WendyAnneWarr @macinchem Quite frustrating. I’ve got some good content!

WendyAnneWarr #shef2019 the other Lhasa work described at https://t.co/DHYEiWJong

WendyAnneWarr #shef2019 second Lhasa Cronos consortium has been started and a new one (effiris?)

WendyAnneWarr #shef2019 see https://t.co/pbEUchoWmo for Nicolas Papernot’s publications

baoilleach #Shef2019 Mihaela Smilova on Fragment hotspot mapping to drive the semi-automated elaboration of fragment screening hits
    --> baoilleach #Shef2019 Fragment screening is widely used in drug discovery. My project is what to do with the wealth of data generated. We can identify hits, bindings models and interactions. Unfortunately, it can be difficult to interpret, don't know binding affinity, no way to rank hits.
        --> baoilleach #Shef2019 Aims are to develop pipeline for automated suggestions for elaboration. Using fragment hotspot mapping to guide things (Radoux 2016 J Med Chem).
            --> baoilleach #Shef2019 Hotspots are those areas within the binding pocket that make a large contrib to binding affinity. How to summarise data from an ensemble of proteins?
                --> baoilleach #Shef2019 Start with the ensemble. Remove ligands, water. Protonate. Generate hotspot maps, combine into ensemble map by taking the max or median at each point. Subtract from Protein B to get a selectivity map.
                    --> baoilleach #Shef2019 When compiling the ensemble map we compared using the max or median. Both had their issues. Max gives max info but also max noise. But the median map fails to id kinase hinge acceptor due to hinge flexibility.
                        --> baoilleach #Shef2019 Needed to reach a compromise between info from polar features and info about freq of features. Found it worked well to use points that occurred over 20% frequency.
                            --> baoilleach #Shef2019 Describing example for bromodomains. When subtracting maps, we can see the Ser->Pro difference but also ....
                                --> baoilleach #Shef2019 Showing analysis of the hotspot selectivity map. Trying to work out how to identify the part that is informative.
                                    --> baoilleach #Shef2019 Now moving onto apolar selectivity maps. Kinase example. Key difference is gatekeeper residue. Very visible in apolar map.
                                        --> baoilleach #Shef2019 Now looking at more distantly related kinases. Looking at difference between a selective and non-selective inhibitor. Again the maps give a clear picture of what's the difference.
                                            --> baoilleach #Shef2019 Fast and intuitive way to visualise binding site diffs, as well as identify areas that may confer selectivity. Future work is to automatically prioritise selective hotspots, and nice to do prospective validation.

WendyAnneWarr #shef2019 last paper of the day by Mihaela Smilova university of Oxford. Fragment hotspot mapping to drive the semi-automated elaboration of fragment screening hits.

WendyAnneWarr #shef2019 Radoux et al. J med chem 2016 59 4314 describes fragment hotspot mapping @JMedChem

WendyAnneWarr #shef2019 Smilova. Note that FBDD campaigns result in multiple structures  of same protein.

WendyAnneWarr #shef2019 bietz and Rarey selectivity maps for polar features @JCIM_ACS cited by Smilova

WendyAnneWarr #shef2019 https://t.co/d0L5CHGx9o @JCIM_ACS https://t.co/8HI0hIqMqv

baoilleach #Shef2019 Peas in a pod. https://t.co/87KdtAsin3

egonwillighagen #Shef2019 https://t.co/gAfWyfKPCF
    --> adlvdl @egonwillighagen We always ask presenters to send their slides/posters after the conference and put on our website those we receive. Hopefully we will have most of them up in the next few weeks

cthoytp Want to hear @nathanbroon’s brexit joke? Power through the post #shef2019 activities and be there on time tomorrow morning!

baoilleach #Shef2019 While waiting for the main event, here's a link to my poster on "A medicinal chemistry based measure of R group similarity":


baoilleach #Shef2019 Peter Ertl on the Encyclopaedia of functional groups
    --> baoilleach #Shef2019 My 7th Sheffield Conference. Never presented a lecture, time to fill this embarrassing empty spot on my CV....
        --> baoilleach #Shef2019 A functional group (FG) is a group of atoms/bonds responsible for the characteristic chem rxns of a molecule. Form the cornerstone of many parts of chemistry, not to mention nomenclature. But defn is a bit vague? How many FGs? Tools not very extensive.
            --> baoilleach #Shef2019 I was particularly interested in natural products and wanted a method to identify all FGs by an algorithmic approach without using any predefined set of patterns.
                --> baoilleach #Shef2019 Details in the publication J Cheminf. 9:36 2017. Extend substructures through configuated bonds (but not aromatic bonds). Three-membered heterocycles are also considered.
                    --> baoilleach #Shef2019 Some details. To which extent we want to retain info about the environment? We stored the generalized form rather than splitting them up, shows example of substituted ureas.
                        --> baoilleach #Shef2019 Most common in PubChem is ether, amide, amine, halogens, alcohols, ester. Only 26 present in more than 1% of molecules. Algo identifies > 150K FGs. Majority are singletons.
                            --> baoilleach #Shef2019 Looking now at ChEMBL.  Took FGs from ZINC and plotted ration of ChEMBL:PubChem. At Novartis, when buying molecules we prioritise those with bioactive FGs.
                                --> baoilleach #Shef2019 Now looking at FGs in natural products. Looking at the difference to PubChem, natural products have v little nitrogen, mostly oxygen.
                                    --> baoilleach #Shef2019 Comparing 'preference' of particular FGs for the different datasets discussed. Zinc vs ChEMBL vs NP. Comparison between sets of molecules using cosine similarity between vectors describing freqs of FGs in sets. Could be used to compare the ligands of related proteins.
                                        --> baoilleach Also J Nat Prod, 2019, 82, 1258.
                                            --> baoilleach #Shef2019 A comparison of FGs between bioactive molecules and PubChem. Can tell us which molecules are easy to make (so don't buy in), versus which are important for bioactivity (so buy in).
                    --> dr_greg_landrum @baoilleach There's also an @RDKit_org implementation in the Contrib folder

WendyAnneWarr #shef2019 last day. First up Peter Ertl . The encyclopaedia of functional groups.

WendyAnneWarr #shef2019 Ertl j Cheminf 2017 9:36

jwmay First session #Shef2019 https://t.co/JSKIsjDoOD

RSC_CICAG First speaker this morning is the amazing Peter Ertl (@peter_ertl) from Novartis talking on The encyclopaedia of functional groups. Abstract here: https://t.co/cr6XBksNFp #Shef2019

ssirimulla Bunch of #compchem meetings (#shef2019 #molkin2019 #freeenergymeeting2019 #aidd_cecam) are happening right now. Why am I not attending one of these meetings? Will be there at #caddgrc and #grc_drugmetabolism next month.

WendyAnneWarr #shef2019 the algorithm is described in that publication. An open source implementation in RDKit at github https://t.co/zIklGaFlP9

WendyAnneWarr #shef2019 Ertl identified about 150,000 functional groups

WendyAnneWarr #shef2019 Ertl says  java version now been done in CDK
    --> egonwillighagen @WendyAnneWarr I assume that refers to this paper? https://t.co/ZCxbfE6SUX  #Shef2019

WendyAnneWarr #shef2019 Conference dinner last night was in Cutlers’ Hall https://t.co/kJIV24tPXn

baoilleach #Shef2019 Paul Hawkins on Traversing enormous regions of chemical space with the GPU
    --> baoilleach #Shef2019 The philOEsophy is that shape and electrostatics are fundamental. Douglas Adams' quote on "Space is big..." applied to chemical space. Enamine REAL is 1.43 billion.
        --> baoilleach #Shef2019 How to search these spaces. 2D methods are FAST: millions of mols/CPU/s. 3D methods much slowers: 10s-100s mols/CPU/s. Can we accelerate?
            --> baoilleach #Shef2019 Move to the CPU, is it worth it? GPUs are expensive but substantial algo changes required, and hard to find good developers. On Amazon, 2c per hr vs 43c per hr CPU vs GPU. Needs to be 25X faster to be cost effective, but >50X to be operational effective.
                --> baoilleach #Shef2019 GPUs accelerate 2D search modestly, 7x faster. Not worthwhile. Ref to Imran Haque paper. For larger databases, fall back to CPU.
                    --> baoilleach #Shef2019 We should be representing molecules with shape and electrostatics, not just 2D or 3D. Description of ROCS. Rigid alignment of query to db molecules to maximise shape and electrostatic overlay. 150 mols/CPU/s.
                        --> baoilleach #Shef2019 At the end we get the sum of the shape and electrostatic scores, the TanimotoCombo (TC). Probability of activ in lead opt can be predicted by TC. Ref to Muchmore and belief theory papers.
                            --> baoilleach #Shef2019 FastROCS: ROCS on the GPU (2011). 2000X speedup. ~500K mols/CPU/s. Year after year got faster - better GPUs Moore's Law. More recently, better coding. Now doing 700K/CPU/s on the fastest GPUs.
                                --> baoilleach #Shef2019 How did we did it? Better handling of the memory on GPU to store the conformers. Optimized the GPU code.
                                    --> baoilleach #Shef2019 Results not identical. The tools perform numerically the same but the results are not identical. Spearman rho 0.9.
                                        --> baoilleach #Shef2019 Describes Orion. CADD in the cloud. Case study of FastROCS in the cloud. Billions of molecules. Designed to simple to use for infrequent non-expert users.
                                            --> baoilleach #Shef2019 Do we get better molecules if we search larger and larger virtual library spaces? Grebner et al, JCIM. manuscript submitted.
                            --> olexandr @baoilleach Do you have full references by chance?
                                --> dr_greg_landrum @olexandr @baoilleach One of the primary belief theory papers (in chemistry) is this one: https://t.co/8R28x8thBR
                                    --> olexandr @dr_greg_landrum @baoilleach Thanks
                        --> Dondoesart1 @baoilleach 👊

hjuinj My first talk at Sheffield! (2 days ago...)  #Shef2019 https://t.co/XIQIPxXqGe

WendyAnneWarr #shef2019 now Paul Hawkins traverses enormous regions of chemical space with the GPU

WendyAnneWarr #shef2019 https://t.co/evAVs0BVWY

WendyAnneWarr #shef2019 space is really really big . Ertl quotes Douglas Adams. Consider Enamine REAL

GJPvWesten Starting with Douglas Adams.. Simply great :) #shef2019

WendyAnneWarr #shef2019 Hawkins. 3D search of such spaces can be very slow. So port to GPU. AWS spot instances have advantages

WendyAnneWarr #shef2019 Hawkins. For 2D search not cost effective to port to GPU. Perhaps 7 times faster.

WendyAnneWarr #shef2019 Hawkins. Molecules have SHAPE plus atoms. OpenEye ROCS recognises this. Hawkins @JMedChem 2007 50 74

WendyAnneWarr #shef2019 TanimotoCombo predicts biology well shows Hawkins. FastROCS on GPU about 2K times faster

WendyAnneWarr #shef2019 Hawkins . Nowadays FastROCS is even faster1.6 millions mols per sec per instance .

WendyAnneWarr #shef2019 Hawkins. Orion offers CADD in the cloud. @OpenEyeSoftware

WendyAnneWarr #shef2019 FastROCS on Orion on server for non experts. Or in Floe the Orion workflow @OpenEyeSoftware for trillions of compounds

egonwillighagen cc @pschmidtke #Shef2019 https://t.co/ca9nY8Lt71

WendyAnneWarr #shef2019 larger databases increase hit similarity says Hawkins @OpenEyeSoftware small increase in TanimotoCombo gives big increase in biological impact @OpenEyeSoftware

WendyAnneWarr #shef2019 i.e., bigger number of hits https://t.co/KEI0IqOPzV

WendyAnneWarr #shef2019 Hawkins gives some mind boggling stats about short time and tiny cost @OpenEyeSoftware

WendyAnneWarr #shef2019 Hawkins also summarises docking with FRED on CPU @OpenEyeSoftware

WendyAnneWarr #shef2019 https://t.co/bKnF6pKY30 @OpenEyeSoftware

WendyAnneWarr Correction! Increase hit DIVERSITY . Grebner  MS #shef2019 @OpenEyeSoftware https://t.co/KEI0IqOPzV

baoilleach #Shef2019 Barbara Zdrazil on Linking chemical and biological space for tracking target innovation trends.
    --> baoilleach #Shef2019 Looking at innovation in drug discovery over time.  Historically, drug discov is dominated by 4 protein targets, GPCRs, Kinases, Ion channels and NRs. Nat Rev Drug Discov 2017, 16, 19.
        --> baoilleach #Shef2019 Target innovation refers to the discov and exploitation of new targets. The review previously mentioned shows over time the approval year for the families discussed.
            --> baoilleach #Shef2019 Can we use preclinical data (as found in ChEMBL) provide a meaningful measure to explore trends in research attention in target space. Use different measures, e.g. no. of published cmpds, no. of papers, no of targets, drug/target annotation such as efficacy.
                --> baoilleach #Shef2019 Also use GO terms, to look at trends from a biological perspective. Disease trends too. Can we tie it all together to get a network view and find new potential links.
                    --> baoilleach #Shef2019 Looked at 6 major target classes. DisGeNET used for disease annotations. Shows trend in GO terms normalised by the no of bioactivities per year. Use Wald test [?} for statistical signif.
                        --> baoilleach #Shef2019 Recently kinases have outplaced GPCRs on all four measures of target innovation. Peaks for kinases are due to large screening studies.
                            --> baoilleach #Shef2019 Investigations of GO term trends. Targets related to immune system processes are on the rise. In GPCRs, trend is driven by an increase in opioid, cannabinoid and CC-chemokine receptors. For kinases, it's Janus kinases.
                                --> baoilleach #Shef2019 Decrease in circulatory system process targets. It seems that these are now being modulated by downstream interventions.
                                    --> baoilleach #Shef2019 Describing disease evolution over time. For cancer related diseases such as prostate cancer, JAKs contribute to the upward trend. Targets related to Intellectual disability are more heavily investigated recently.
                                        --> baoilleach #Shef2019 Now describing the network analysis. Dynamic networks - changing over time. Suggests a new link between two diseases, and possible similar treatments for both.
                                            --> baoilleach #Shef2019 Preprint available hopefully next week.
                                                --> BZdrazil @baoilleach Thank you Noel for this perfect summary of my talk @rguha @nathanbroon

WendyAnneWarr #shef2019 Barbara. Four protein families dominate drug targets. Santos Nat Rev Drug Discovery 2017

WendyAnneWarr #shef2019 Santos 2017 16 19 target trends

WendyAnneWarr #shef2019 can we use ChEMBL to study trends in target space over time? and GO bio process trends and disease trends DisGeNET

WendyAnneWarr #shef2019 kinases have outpaced GPCRs recently

WendyAnneWarr #shef2019 GO term evolution kinases and ion channels postive trend proteases and NRs negative

WendyAnneWarr #shef2019 targets related to immune system processes are on the rise. Janus kinases. Circ system processes recur but currently falling

WendyAnneWarr #shef2019 disease evolution e.g., cancer related diseases. breast cancer related targets plotted.

WendyAnneWarr #shef2019 targets related to intellectual disability upward trend. Can study emerging targets

WendyAnneWarr #shef2019 finally a network view of targets, GO terms and diseases . Dynamic network. Network connectivity over the years JAKs AKT RAF are of increasing interest. Keep GO term fixed and look at trends In diseases. Immune system process networks shown

WendyAnneWarr #shef2019 link between breast cancer and schizophrenia!

WendyAnneWarr #shef2019 Barbara’s talk will be on bio archive next week

WendyAnneWarr #shef2019 upward trend for schizophrenia too https://t.co/rgBEnm60BC

baoilleach #Shef2019 @nathanbroon praises Peter Willett's contributions over 40 years, and the fact that many here would not be here without him. Now the Peter Willett poster award goes to... James Webster.

baoilleach #Shef2019 Esben Jannik Bjerrum (AZ) on SMILES based encoders
    --> baoilleach #Shef2019 Describing SMILES. There are some long-range interactions between brackets, and ring symbols.
        --> baoilleach #Shef2019 A recurrent neural network takes seqs of features as input. Very useful for modelling sequences such as text, tweets, time series, etc. For SMILES, we do a one-hot encoding with a defined vocabulary.
            --> baoilleach #Shef2019 Especially useful are LSTM (long short-term memory) cells to ensure that the levels remember don't forget what happened earlier. This architecture allows it to have hidden state and remember long-range interactions.
                --> baoilleach #Shef2019 SMILES string fed in, starts with <GO>, ends with <STOP>. For use as an encoder is turned into a vector.
                    --> baoilleach #Shef2019 Canonical SMILES ensure a 1:1 relationship between mol and SMILES. I go the other way, and generate multiple SMILES for the same mol - this works as a data augmentation. Shows RDKit code to generate 'random SMILES'. Shows 11K SMILES from the same drug.
                        --> baoilleach #Shef2019 RNNs can also be used as generators. We train the LSTM network to predict the next character in the sequence. REINVENT: Olivecrona et al. J Cheminf 2018.
                            --> baoilleach #Shef2019 SMILES-based autoencoders. We read in SMILES. Can use RNNs (or CNNs). Can manipulate space to follow distrubition or for QSAR task. Then decode to SMILES. Ref to Gomez-Bombarelli paper.
                                --> baoilleach #Shef2019 One of the problems was that non-canonical SMILES for the same molecule could be scattered all over the latent space for Conv2RNN. Because a SMILES is not a molecule - it's a string representation.
                                    --> baoilleach #Shef2019 A solution is to use heteroEncoders. Now non-canonical SMILES start to cluster. [missing details] Can measure the distance between SMILES string representations based on sequence alignment. Also used Tanimoto.
                                        --> baoilleach #Shef2019 This is about developed a latent space that is somewhat related to existing molecular similarity measures (not to recreate those measures exactly).
                                            --> baoilleach #Shef2019 The same latent space point just generates 1 molecule in Can2Can but 111 for Enum2Enum. How is this done, the one-to-many mapping? The canonical one is absolutely sure what character is next. The Can2Enum model has variations in probabilities, and so can generate many.
                                                --> baoilleach #Shef2019 These latent vectors can be used as a base for QSAR models. We see a clear increase in performance as we go from the canonical to the enumeration training. Surprising was the poor performance of ECFP4 in these models.
                                                    --> baoilleach #Shef2019 Looking now at trends when searching for hyperparams. Larger is better for the decoder (except for linear QSAR model). Shows the final architecture. Read SMILES into two bidirectional LSTMs. The states go into the code layer. 99% valid SMILES.
                                                        --> baoilleach #Shef2019 The latent space appears to be non-linear which is why it works well with non-linear QSAR methods but not so well with linear.
                                                            --> baoilleach #Shef2019 Hard to beat REINVENT which works quite well.

A note on the toolkits: RDKit, Keras/Tensorflow, PyTorch. Code on github molvegen. Coming soon Deep-Drug-Coder.

RSC_CICAG First speaker of the final Machine Learning session is Esben Jannik Bjerrum from AstraZeneca Molecular hetero-encoders derived descriptors and their use in QSAR and de novo generation. https://t.co/67msp9hwM5 #Shef2019
    --> RSC_CICAG https://t.co/N9Dvv8Sw7j #Shef2019
        --> RSC_CICAG Great talk on encoders for molecular design. #Shef2019 https://t.co/t7Nt85dGnp

WendyAnneWarr #shef2019 Esben Bjerrum et al. Biomolecules 2018 8 (4) 131 and Bjerrum ArXiv:1703.07076 (2017)

WendyAnneWarr #shef2019 SMILES based autoencoder ACS Central Science 2018 4(2) 268
    --> WendyAnneWarr #shef2019 Fergus Imrie also cited this paper
        --> A_Aspuru_Guzik @WendyAnneWarr If is a good paper <grin>

WendyAnneWarr @ACSCentSci #shef2019 https://t.co/JoIb88ewAm

WendyAnneWarr #shef2019 Nathan Brown chaired first session today @RSC_CICAG https://t.co/UuvsgAom1x

WendyAnneWarr #shef2019 James Watson’s winning poster was reaction vector based Monte Carlo tree search for de novo design. Here he is (left) getting award from Peter Willett https://t.co/8WkUitoaam https://t.co/18EdVcgd0O

WendyAnneWarr #shef2019 https://t.co/Xolnznf89I

cthoytp More code at #shef2019 from AstraZeneca (https://t.co/kkd7FkPXJV) https://t.co/E0tCWiznp4

    --> ApitiusHofmann @cthoytp Is that meeting paying off?
        --> daniel_sunday @ApitiusHofmann @cthoytp Charlie realized how much he misses Chemistry in his life https://t.co/OhES203Tl0

baoilleach #Shef2019 Fergus Imrie on Deep generative models for 3D compound design from fragment screens
    --> baoilleach #Shef2019 Recently graph-based approaches have shown some promise, but still mostly text-based. Tasks are either arbitarily molecule generation (sampling from a distribution, general or focussed), or for molecular optimisation (optimise a chemical property given a molecule).
        --> baoilleach #Shef2019 One problem is that the user has little control. Also, these methods are not suited to other design tasks. For example, scaffold hopping and fragment linking.
            --> baoilleach #Shef2019 Shows fragment hits in binding site. How can I create a molecule that contains these fragments but conserves the 3D shape and can still bind in the same manner.
                --> baoilleach #Shef2019 Existing tools are based on db of known linkers which are ranked based on whether they can join these fragments up. Prob is that you are ltd to the db contents, and cannot explore beyond. We have developed a de novo generative method to do this.
                    --> baoilleach #Shef2019 We still have a database of linkers but use it to learn how to join fragments together. Method called "DeLinker".
                        --> baoilleach #Shef2019 It's a graph-based deep learning method. Each one of our nodes has a hidden vector assoc with it. Initially all of the atoms are of the same type (i.e. have the same vector). The internal repr is updated to reflect the environ around them. Use Message passing NN.
                            --> baoilleach #Shef2019 After it's run for many steps, the info comes from further and further away. Next, we initialise expansion nodes. Next we go thru an iterative process of adding bonds to the possible nodes.
                                --> baoilleach #Shef2019 We do this with Feed-forward NN. It takes into a/c the time step (is this the first atom?), the node embeddings, average embeddings and structural info on the distance and angle we want.
                                    --> baoilleach #Shef2019 The bond order is set with another FF NN. The initial step is now repeated to update atom environments. Repeat until the atoms are joined. Not all nodes will be used, but this is not a problem.
                                        --> baoilleach #Shef2019 How to train? Train in a supervised manner with known linkers. We encode both the fragments and the linked fragments using the earlier procedure. [details missed] When decoding we add noise to get a diverse set of generated linkers.
                                            --> baoilleach #Shef2019 Creating a training set from ZINC, where we create 3D conformations, snip up molecules to create fragments joined by a linker. To test, using the CASF-2016 dataset, set of 250 diverse binders in active conformations. Also published fragment screens.
                                                --> baoilleach #Shef2019 How to assess our model? Different measures: valid, unique, novel, recovered. Also 2D property filters. Compare results to those predicted from the db of known linkers. Not always valid (92-95%), but novel, and better recovered.
                                                    --> baoilleach #Shef2019 Also 3D measures of similarity to asses using RDKit's shape and colour similarity score (SCrdkit), and RMSD.
                                                        --> baoilleach #Shef2019 Moving onto a case study showing an exptal xtal structure of two fragments bound and three molecules in xtal poses described in the original paper. Our method recovered these and find more.
                                                            --> baoilleach #Shef2019 Now looking at de novo design vs exhaustive db search wrt a case study. Even if you love your db, it might not solve your problem.: in this case, de novo design did better.
                                                                --> baoilleach #Shef2019 Question about whether it can handle cases where there is a barrier in between the fragments. Answer: not yet, but we're working on it.

RSC_CICAG Next up is Fergus Imrie from the University of Oxford speaking on Deep generative models for 3D compound design from fragment screens. Abstract here: https://t.co/K7pONOsfRH #Shef2019
    --> RSC_CICAG Fragment linking and scaffold hopping. Two ways of looking g st the same problem. #Shef2019
        --> RSC_CICAG Generation of molecules is done atom by atom. #Shef2019
    --> RSC_CICAG Using message passing neural networks: https://t.co/Pyffd0j5uP #Shef2019
    --> RSC_CICAG Terrific talk on designing fragment linkers using deep learning. #Shef2019 https://t.co/xy24Vbxfqy

WendyAnneWarr #shef2019 problems in fragment elaboration is one reason why FBDD is failing to live up to its promise. Hence Imrie’s work.

WendyAnneWarr #shef2019 Imrie graph-based deep generative methods for fragment elaboration combining state of the art machine learning techniques with structural knowledge

WendyAnneWarr #shef2019 Imrie is at univ Oxford. Co-authors are Anthony Bradley of exscientia, @exscientialtd Mihaela van Dee Schwartz of Cambridge and Alan Turing institute, and Charlotte Deane of Oxford

RSC_CICAG The winner of the first Peter Willett Award for Outstanding Poster Presentation goes to James Webster. @RSC_CICAG are proud to be sponsoring this award in recognition of both Peter and great new talent. #Shef2019 (photo courtesy of @WendyAnneWarr) #CompChem #RealTimeChem https://t.co/XpB53y8aA1

    --> BZdrazil @RSC_CICAG @WendyAnneWarr Absolutely deserved!

WendyAnneWarr #shef2019 Imrie’s conclusions are in final para of his abstract below https://t.co/oWdaBQFpQT

baoilleach #Shef2019 Marwin Segler on Planning chemical syntheses with deep neural networks and Monte Carlo tree search
    --> baoilleach #Shef2019 (First speaker to include their Twitter handle: @marwinsegler)
        --> baoilleach @marwinsegler #Shef2019 Describes the molecular design cycle, test/design/make. Ref to GuacaMol benchmark for de novo mol design. JCIM 2019, 1096.
            --> baoilleach @marwinsegler #Shef2019 Why help with the 'make' part of the cycle? Provide better routes for chemists and for de novo design, predict synthesisability. "Break out of your Suzuki-Amide comfort zone!" Ref to @DrBostrom "Expanding the med chem synthetic toolbox"
                --> baoilleach @marwinsegler @DrBostrom #Shef2019 The current state of route design in chemistry is similar to these old route maps we can lookup. We have our databases like Reaxys and SciFinder.
                    --> baoilleach @marwinsegler @DrBostrom #Shef2019 The vision is a GPS system, that can get us from A->B reliability in a reasonable amount of time.
                        --> baoilleach @marwinsegler @DrBostrom #Shef2019 Describing retro synthesis vs synthesis planning. We take known reactions, and extract a general rule, reversing the rule yield a transform, and apply the transform to novel targets.
                            --> baoilleach @marwinsegler @DrBostrom #Shef2019 Now showing a retrosyn tree. A start state and several goal states. Keep doing decompositions from the start state until you arrive at building blocks you can buy - the goal state.
                                --> baoilleach @marwinsegler @DrBostrom #Shef2019 Shows many possible ways to make the same simple molecule. The task is to rank all the possible ways of making it. The average no of rules is 46K rules, and the depth is 5 to 20. 3 steps would already be 100TB of data,
                                    --> baoilleach @marwinsegler @DrBostrom #Shef2019 The challenge is to prioritise the rules. Vleduts (1963) and then Corey (1968). [Suggests we all read the Vleduts paper - visionary].
                                        --> baoilleach @marwinsegler @DrBostrom #Shef2019 Now showing a question about Suzuki vs Kamada. The Grignard would react with the aldehyde and so shouldn't be used in this case.
                                            --> baoilleach @marwinsegler @DrBostrom #Shef2019 Corey took a decade to encode his knowledge, but so much more information on reactions available now and so manual coding is probably not the way to go.
                                                --> baoilleach @marwinsegler @DrBostrom #Shef2019 Our idea is to learn transformation rules from data, use machine-learning to prioritise rules, learn to predict reactions.  Ref to Saller et al Org Process Res Dev 2015, 19, ....
                                                    --> baoilleach @marwinsegler @DrBostrom #Shef2019 We use the Reaxys dataset in our study. 11M reactions. Then we use deep learning. The way we train is to use a time split; train on <= 2014 and predict since then.
                                                        --> baoilleach @marwinsegler @DrBostrom #Shef2019 How well does it predict the rule. 31% of the time. Top 10 is 63%. 73% in top 50. So we just use the top50 instead of all the rules.
                                                            --> baoilleach @marwinsegler @DrBostrom #Shef2019 We need to filter out infeasible reactions. We build a DNN model to do this. We generate infeasible reactions to train on as don't have any of our own.
                                                                --> baoilleach @marwinsegler @DrBostrom #Shef2019 Could define heuristics, e.g. create evenly sized fragments from larger molecule. We use Monte Carlo Tree Search (MCTS). Not dependent on strong heuristics. Based on multi-armed bandit problem.
                                                                    --> baoilleach @marwinsegler @DrBostrom #Shef2019 There is an upper confidence bound for action being optimal (UCT). A balance of exploration vs exploitation. The UCT is expanded to PUCT using probabilities (ref to Roisin). Tested on 50-molecule dataset from Chematica paper.
                                                                        --> baoilleach @marwinsegler @DrBostrom #Shef2019 Did the chemical Turing test. Double blind. Show the two routes from papers or from computer. Can they tell them apart? Chemists cannot distinguish between the two sources.
                                                                            --> baoilleach @marwinsegler @DrBostrom #Shef2019 Unsolved challenges are condition prediction, yield prediction, natural products, cannot invent novel reactions (though see Segler Chem Eur J 2017).

RSC_CICAG Last speaker of the session, and of the entire conference, is Marwin Segler (@marwinsegler) from @benevolent_ai talking on Planning chemical syntheses with deep neural networks and Monte Carlo tree search. #Shef2019
    --> RSC_CICAG Open benchmarking framework for de novo molecular design: GuacaMol: Benchmarking Models for de Novo Molecular Design https://t.co/nOJ5VFiD1n #Shef2019 @marwinsegler @benevolent_ai
        --> RSC_CICAG We are good at designing and testing, but making remains a significant challenge in #Chemoinformatics. #Shef2019
            --> RSC_CICAG Reactions growing by an order of magnitude every decade. Corey ‘only’ had a million reactions in the 60s. #Shef2019
                --> RSC_CICAG Great talk on synthesis planning from @marwinsegler to close the conference. #Shef2019 https://t.co/gR6dvqv2Sr

WendyAnneWarr #shef2019 Segler et al Chem Eur J 2017 23 5966; nature 2018 555 604

WendyAnneWarr #shef2019 Segler et al @ACSCentSci 2018

WendyAnneWarr #shef2019 @ACSCentSci 2018 4 120 https://t.co/odiHifKths

WendyAnneWarr #shef2019 Nathan Brown, Segler et al had poster on Guacamol https://t.co/xA1CDQ2uqH

WendyAnneWarr #shef2019 paper on Guacamol brown et al @JCIM_ACS 2019 59 1096 (thank you @SciFinder !)

curephile Thanks to @baoilleach @WendyAnneWarr and @RSC_CICAG . Enjoyed following #Shef2019 in twitter

baoilleach #Shef2019 I'm done! Bye.
    --> egonwillighagen @baoilleach thanks!
    --> RSC_CICAG @baoilleach https://t.co/5UgQMnN7Y6

    --> curephile @baoilleach Wow! That was great!
    --> rguha @baoilleach Many thanks for fantastic coverage of #Shef2019
    --> ozkirimli_elif @baoilleach Thank you!!
    --> Chris_de_Graaf @baoilleach Thanks!
    --> ssirimulla @baoilleach Thank you. https://t.co/bBz4o1xjRZ

    --> hjuinj @baoilleach mamba out.

WendyAnneWarr #shef2019 correction Mihaela van der Schaar https://t.co/Ui2K63K2DH

CKannas It was great #Shef2019! It was great being back in Sheffield after 3 years and to my second Sheffield Chemoinformatics Conference (my 1st 2013). Thanks #SCRG!

CDDVault [JUNE 17-19] Stop by our exhibit table and have a chat with Renate Baker and Susana Tomasio from CDD about your discovery informatics workflows.
#CDD #ElectronicLabNotebook #DrugDiscovery #ELN #MedicinalChemistry #Biologics #shef2019

WillPitt1 Thanks very much to the organisers of #Shef2019. Great conference as usual.

RSC_CICAG We had pride of place at #Shef2019 as one of the sponsors. Terrific meeting as always and many applause to the organising team! https://t.co/TXtEwZG4o5

WendyAnneWarr #shef2019 Noel @nmsoftware did a really great job with the science. I gather that you had a keyboard, Noel. I was being all thumbs with phone. Decided eventually that it was pointless to duplicate much of what you wrote. So I did photos and citations etc. later. Great meeting! https://t.co/ZQpBtuyqCS
    --> baoilleach @WendyAnneWarr @nmsoftware Yes, I'm on my laptop. I saw what you were doing and thought we worked well together. I'm going to pull out all the tweets into a web page for easy reference.
        --> nathanbroon @baoilleach @WendyAnneWarr @nmsoftware I did all the GIFs and memes!
            --> WendyAnneWarr @nathanbroon @baoilleach @nmsoftware I know. How infuriating that Twitter messed RSC about. You should flag this issue with Twitter.
                --> nathanbroon @WendyAnneWarr @baoilleach @nmsoftware We will raise a ticket as this has happened before. I suspect that their algorithms detect us as spammers as our tweet frequency is quite variable.
                    --> WendyAnneWarr @nathanbroon @baoilleach @nmsoftware But many people have tweet variance. I do hundreds a day at a conference but then may go to ground for two weeks.
                        --> nathanbroon @WendyAnneWarr @baoilleach @nmsoftware Precisely. Hopefully we can get this sorted and perhaps blue-ticked so we are recognised as a professional society.
                            --> baoilleach @nathanbroon @WendyAnneWarr @nmsoftware I think all the tweets are there. I'll tidy them up and you can have a look.
    --> adlvdl @WendyAnneWarr @nmsoftware I hope it is solve for the next conference and I'm looking forward to @baoilleach twitter summary of the conference

BZdrazil Bye, bye 🇬🇧! It was a fantastic conference! Thanks to the organization! What could be improved though is the percentage of female speakers. If i counted correctly it was 4 or 5 women out of 27 speakers in total #WomenInScience  #shef2019
    --> nathanbroon @BZdrazil Completely support your sentiment and I raised this at committee. We have a long way to go.
        --> BZdrazil @nathanbroon Thanks for support regarding this issue :)
            --> nathanbroon @BZdrazil I am drafting some guidelines for the various conferences I am involved in organising. Where we can make a difference, we are honour-bound to be that difference.

RSC_CICAG We awarded our first of many awards in honour of Prof. Peter Willett, a titan of #Chemoinformatics at the #Shef2019 conference. Many in the audience would not be doing great science if it were not for his contributions: https://t.co/5MotEQxxqn #CompChem #RealTimeChem https://t.co/M4lZ07uLbo

griffen_ed #Shef2019: excellent meeting, speakers, posters and discussion in and out of the sessions. Many thanks to the organising committee.

BZdrazil Have missed my connecting flight when returning from #Shef2019 at least the serve some proper English breakfast here in Frankfurt :) https://t.co/lfAI73PoWw

    --> baoilleach @BZdrazil Oh no. You were worried about that :-/
        --> pwk2013 @baoilleach @BZdrazil Lucky you since I have heard that German breakfasts are the wurst.
        --> BZdrazil @baoilleach Yes! Self fulfilling prophecy :(