[ Pobierz całość w formacie PDF ]
experiment. This showed that the fingerprint was able to re-
was also confirmed in the reaction superclasses, achieving a
extract 48 out of our 50 reaction types, proving its applicability
mean F-score of 0.82 (mean precision = 0.93, mean recall =
for searching similar reactions. In a separate exercise, not
0.76).
described here, we have used this approach to find clusters of
These results demonstrate that intratype similarities vary
similar reactions within the unclassified reactions of the patent
widely between types. At one end of the spectrum is the
data allowing us to detect new or not yet registered reaction
methylation reaction type, where a broad chemical variety is
types. One drawback of the current approach to generate the
observed. At the other end, one finds examples like the hydroxy
fingerprint is its dependence on the atom mapping to identify
to methoxy (1.7.4) transformation and the methyl esterification
the agents or more generally on the correct annotation of
(1.7.6) or the Fischer-Speier esterification (2.6.3)Õàall of
agents and nonagents. Improper handling of agents has been
which involve adding a carbon atom to an OH, where both
shown to introduce a reasonable amount of noise into the
intra- and intertype variability are low.
fingerprint.
CONCLUSIONS
%
ASSOCIATED CONTENT
%
In this study, we presented the development of a novel reaction
S
* Supporting Information
transformation fingerprint that can be used for model building
as well as similarity search. We showed that applying the Further figures and results of this study are summarized in the
transformation fingerprint in a multi-class prediction model PDF. An Excel sheet contains the 226 unclassified patent
51 DOI: 10.1021/ci5006614
J. Chem. Inf. Model. 2015, 55, 39-53
Journal of Chemical Information and Modeling Article
(14) Ridder, L.; Wagener, M. SyGMa: Combining Expert Knowledge
reactions that were manually checked. Training and tests sets
and Empirical Scoring in the Prediction of Metabolites. ChemMed-
constructed from the patent data set are in zipped Python
Chem. 2008, 3, 821-832.
pickle files. Also inlcuded are all IPython notebooks to
(15) Patel, H.; Bodkin, M. J.; Chen, B.; Gillet, V. J. Knowledge-Based
reproduce the results shown in the paper. This material is
Approach to de NovoDesign Using Reaction Vectors. J. Chem. Inf.
available free of charge via the Internet at http://pubs.acs.org.
Model. 2009, 49, 1163-1184.
(16) Sheridan, R. P.; Hunt, P.; Culberson, J. C. Molecular
AUTHOR INFORMATION
%
transformations as a way of finding and exploiting consistent local
Corresponding Author
QSAR. J. Chem. Inf. Model. 2006, 46, 180-192.
*E-mail: gregory.landrum@novartis.com. (17) Bolton, E.; Wang, Y.; Thiessen, P. A.; Bryant, S. H. PubChem:
Integrated Platform of Small Molecules and Biological Activities. In
Notes
Annual Reports in Computational Chemistry; Vol. 4; American Chemical
The authors declare the following competing financial
Society: Washington, DC, 2008.
interest(s): R.S. and D.L. are employees of NextMove Software
(18) Bernstein, F. C.; Koetzle, T. F.; Williams, G. J.; Meyer, E. E., Jr.;
that markets the NameRxn tool used in this contribution.
Brice, M. D.; Rodgers, J. R.; Kennard, O.; Shimanouchi, T.; Tasumi,
M. The Protein Data Bank: A Computer-based Archival File For
ACKNOWLEDGMENTS
Macromolecular Structures. J. Mol. Biol. 1977, 112, 535.
%
(19) Gaulton, A.; Bellis, L. J.; Bento, A. P.; Chambers, J.; Davies, M.;
N.S. thanks the Novartis Institutes for BioMedical Research
Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; Al-Lazikani, B.;
education office for a Presidential Postdoctoral Fellowship. The
Overington, J. P. ChEMBL: A large-scale bioactivity database for drug
authors thank Nikolas Fechner for his help with scikit-learn and
discovery. Nucleic Acids Res. 2012, 40, D1100-D1107.
many insightful discussions on machine-learning and finger-
(20) Blake, J. E.; Dana, R. C. CASREACT: More than a million
prints. The authors also wanted to thank Mike Tarselli for his
reactions. J. Chem. Inf. Comput. Sci. 1990, 30 (4), 394-399.
help with manually classifying the set of unclassified patent
(21) Reaxys Database. http://www.elsevier.com/online-tools/reaxys
data.
(accessed October 17, 2014).
(22) SPRESI Database. http://infochem.de/products/databases/
ABBREVIATIONS
spresi.shtml (accessed October 17, 2014).
%
(23) ChemSpider SyntheticPages Database. https://cssp.chemspider.
AP atom pairs; ELN electronic laboratory notebook; FP
com/ (accessed October 17, 2014).
fingerprint; kNN k-neareast neighbors; LR logistic regression;
(24) Webreactions Database. http://www.openmolecules.org/
ML machine learning; NB naive Bayes; RF random forest; TT
webreactions/index.html (accessed October 17, 2014).
topological torsions
(25) Lowe, D. M. Extraction of Chemical Structures and Reactions
[ Pobierz całość w formacie PDF ]
pobieranie ^ do ÂściÂągnięcia ^ pdf ^ download ^ ebook
Menu
- Home
- Leslie Charteris The Saint 31 The Saint Around The World
- Mcgraw.Hill.Sales.Success.Handbook.20.Lessons.To.Open.And.Close.Sales.Now
- Dawn Lindsey Na próbć™
- Jackson Brenda Delaney's desert sheikh
- Rozkosze Nocy Day Sylvia
- Morrell Dav
- Aleister Crowley Magick in Theory and Practice
- 6.Michael Moorcock Zemsta Róśźy
- Terry Pratchett Nomów ksić™ga kopania [pl]
- Le Guin Ursula Hain 9 Cztery drogi ku przebaczeniu
- zanotowane.pl
- doc.pisz.pl
- pdf.pisz.pl
- anieski.keep.pl
Cytat
Fallite fallentes - okłamujcie kłamiących. Owidiusz
Diligentia comparat divitias - pilność zestawia bogactwa. Cyceron
Daj mi właściwe słowo i odpowiedni akcent, a poruszę świat. Joseph Conrad
I brak precedensu jest precedensem. Stanisław Jerzy Lec (pierw. de Tusch - Letz, 1909-1966)
Ex ante - z przed; zanim; oparte na wcześniejszych założeniach.