Quantifying structure and performance diversity for sets of small molecules comprising small-molecule screening collections.

Proc Natl Acad Sci U S A
Authors
Keywords
Abstract

Using a diverse collection of small molecules we recently found that compound sets from different sources (commercial; academic; natural) have different protein-binding behaviors, and these behaviors correlate with trends in stereochemical complexity for these compound sets. These results lend insight into structural features that synthetic chemists might target when synthesizing screening collections for biological discovery. We report extensive characterization of structural properties and diversity of biological performance for these compounds and expand comparative analyses to include physicochemical properties and three-dimensional shapes of predicted conformers. The results highlight additional similarities and differences between the sets, but also the dependence of such comparisons on the choice of molecular descriptors. Using a protein-binding dataset, we introduce an information-theoretic measure to assess diversity of performance with a constraint on specificity. Rather than relying on finding individual active compounds, this measure allows rational judgment of compound subsets as groups. We also apply this measure to publicly available data from ChemBank for the same compound sets across a diverse group of functional assays. We find that performance diversity of compound sets is relatively stable across a range of property values as judged by this measure, both in protein-binding studies and functional assays. Because building screening collections with improved performance depends on efficient use of synthetic organic chemistry resources, these studies illustrate an important quantitative framework to help prioritize choices made in building such collections.

Year of Publication
2011
Journal
Proc Natl Acad Sci U S A
Volume
108
Issue
17
Pages
6817-22
Date Published
2011 Apr 26
ISSN
1091-6490
DOI
10.1073/pnas.1015024108
PubMed ID
21482810
PubMed Central ID
PMC3084049
Links
Grant list
P50-GM069721 / GM / NIGMS NIH HHS / United States
P20-HG003895 / HG / NHGRI NIH HHS / United States
Howard Hughes Medical Institute / United States
P50 GM069721 / GM / NIGMS NIH HHS / United States
N01CO12400 / CA / NCI NIH HHS / United States
N01-CO-12400 / CO / NCI NIH HHS / United States
P20 HG003895 / HG / NHGRI NIH HHS / United States