- Open Access
Fast 3D shape screening of large chemical databases through alignment-recycling
© Fontaine et al 2007
- Received: 04 April 2007
- Accepted: 06 June 2007
- Published: 06 June 2007
Large chemical databases require fast, efficient, and simple ways of looking for similar structures. Although such tasks are now fairly well resolved for graph-based similarity queries, they remain an issue for 3D approaches, particularly for those based on 3D shape overlays. Inspired by a recent technique developed to compare molecular shapes, we designed a hybrid methodology, alignment-recycling, that enables efficient retrieval and alignment of structures with similar 3D shapes.
Using a dataset of more than one million PubChem compounds of limited size (< 28 heavy atoms) and flexibility (< 6 rotatable bonds), we obtained a set of a few thousand diverse structures covering entirely the 3D shape space of the conformers of the dataset. Transformation matrices gathered from the overlays between these diverse structures and the 3D conformer dataset allowed us to drastically (100-fold) reduce the CPU time required for shape overlay. The alignment-recycling heuristic produces results consistent with de novo alignment calculation, with better than 80% hit list overlap on average.
Overlay-based 3D methods are computationally demanding when searching large databases. Alignment-recycling reduces the CPU time to perform shape similarity searches by breaking the alignment problem into three steps: selection of diverse shapes to describe the database shape-space; overlay of the database conformers to the diverse shapes; and non-optimized overlay of query and database conformers using common reference shapes. The precomputation, required by the first two steps, is a significant cost of the method; however, once performed, querying is two orders of magnitude faster. Extensions and variations of this methodology, for example, to handle more flexible and larger small-molecules are discussed.
- Shape Space
- Reference Shape
- Diverse Shape
- PubChem Compound
- Single Conformer
Databases of chemical structures are a key component of chemical information infrastructures. Searching these databases requires specialized methods, for example, to find similar chemical structures.
The computational cost of 3D methods, however, is dramatically greater than 2D methods, due to the relative complexity of generating, selecting, and comparing various 3D representations of chemical structures. The cost is particularly severe when the comparisons are done by structural overlay, when considering the additional step of determining an optimal 3D overlay. As such, some groups have focused, for example, on extending 2D methods for the discovery of topologically non-obvious similar compounds using reduced-graph approaches [6–8]. With the increase of available computer power, fast 3D structural overlay software, such as ROCS , has become attractive for large database screening.
where O AB is the volume overlap between conformer A and conformer B, O A is conformer A volume, and O B is conformer B volume
Several published applications of ROCS demonstrate its usefulness in practical medicinal chemistry projects [11–13]. ROCS can screen the dataset used in this work at the rate of ~1 800 conformers per second per (64-bit 3-GHz Intel dual core Xeon) processor. Although this is a remarkable speed for this kind of software, it can still take hours to perform a single search of a moderately sized 3D database containing millions of conformers.
Innovative overlay-based approaches [14, 15] have been created to avoid brute-force comparison between a query conformer and each and every conformer in a 3D database. One approach  involves finding a small "dictionary" of 3D structures that represent the overall diversity of possible 3D shapes. These diverse shapes are then used to create a binary "3D fingerprint" for each conformer in a database, with each set bit corresponding to a computed similarity above a predefined threshold between the diverse shape and the database conformer. This technique shifts the substantial 3D computational overhead into the initial selection of diverse shapes and the generation of the 3D fingerprint for all conformers in the database. For each 3D similarity query, the workflow now becomes identical to that of 2D binary fingerprint methods: compute the fingerprint for the query; loop over the database contents; and determine the bits in common for computation of Eq. 1. After a 3D fingerprint is designed and created, such an approach can significantly reduce the time to search moderately sized 3D databases, e.g., by shape similarity, from hours to minutes.
There are two major differences between the results from brute-force ROCS shape overlay comparison and the 3D shape fingerprint  similarity method. Firstly, the two methods use very different measures for the Tanimoto values and are not guaranteed to give similar results. Secondly, the 3D shape fingerprint similarity approach does not provide a 3D alignment with the query, thus making the results difficult to analyze or visualize. In this study, we attempt to modify an earlier 3D shape similarity approach  to mimic results provided by brute-force ROCS similarity searching, but at a fraction of the computational expense. A novel aspect of our method, which we call "alignment-recycling", comes from recycling the translational and rotational matrices resulting from the shape overlay during the initial selection of diverse shapes.
At the time of project initiation, the PubChem Compound  database contained approximately 5.3 million unique chemicals and mixtures. We focused our attention on a subset of PubChem by targeting only single-component molecules with size and flexibility below lead-like  or drug-like  filtering cut-offs. Our strategy was to work with a simple but relevant subset that could be incrementally updated with more challenging compounds in future studies.
The alignment-recycling (AR) methodology is intended to obviate performing the optimization required to maximize the volume overlap of the query conformer to each and every conformer in a 3D conformer dataset. This is achieved by selecting representative conformers to completely cover the "shape space" of the 3D conformer dataset. The granularity of coverage is defined by an empirical cutoff named "Design-Tanimoto" (see section Reference shape selection). Each conformer in the dataset is overlaid to each representative conformer and the overlay information is retained, if the similarity with a representative conformer is of sufficient magnitude.
The empirical criterion to decide if two overlaid conformers can be considered similar is named "Transform-Tanimoto" (see section Alignment recycling). Its value greatly influences the number of reference shapes associated with each conformer. By means of analogy to a binary fingerprint, the Transform-Tanimoto threshold defines when a bit is set.
To search the dataset by shape similarity, the query fingerprint, to extend the analogy, is compared to the dataset fingerprints to find common reference shapes. The Tanimoto value computed between query and database fingerprints with AR is not that from Eq. 1, as is typical with 2D fingerprint methods and used by the 3D fingerprint method of Haigh et al. . Instead, finding a common reference shape triggers computing, via Eq. 2, the shape Tanimoto between the query conformer and database conformer, as may be performed by a typical brute-force ROCS approach. In our method, the 3D conformer overlay used in computing the shape Tanimoto is generated by simply reusing the transformation, i.e., the rotation matrix and translation vector, from the overlay to the common reference shape. This trivial transformation, while specific to alignment of a reference conformer, when applied, can yield a relatively accurate shape overlay between the query and database conformers without the need to perform the conformer overlay alignment optimization. Usually, when the query and a database conformer are fairly similar, multiple reference shapes are found to be in common. In such cases, all reference shape alignments are reused to find a maximum shape Tanimoto between conformers.
Reference shape selection
As described in the Method section, we implemented the clustering algorithm of Haigh et al.  to select a diverse set of reference shapes. For this study, we chose a Design-Tanimoto value of 0.75, which, according to their work, represented the best trade-off between sampling speed and granularity. This means, by definition, no pair of reference shapes has a similarity above 0.75, after diversity selection, and that every conformer in the entire dataset is associated with at least one reference shape with a shape Tanimoto similarity above 0.75.
Diverse reference shape selection for the one million compound dataset was performed in two stages. In the first stage, only a single conformer representative generated by OMEGA 1.8.1  was used. The single conformer dataset was entirely covered after the inclusion of 2 458 reference shapes. In the second stage, we sampled the conformational space of each compound using OMEGA 2.0 Beta at an RMSD of 1.0 Å. This generated approximately fifteen million (14 925 817) conformers. The distribution of conformers per compound is strongly skewed towards low values, with 50% of the compounds having six or fewer conformers and only 10% of the compounds accounting for 49% of the total conformer count. Interestingly, 99.8% of the fifteen million conformers in the second stage can be clustered at a Design-Tanimoto of 0.75 using one of the initial 2 458 reference shapes of the single conformer subset, revealing a large amount of shape redundancy in the multi-conformer models. However, the shape space of the remaining 0.2% conformers increases the number of diverse reference shapes from 2 458 to 5 534. This potentially surprising result may be a consequence of the sphere-exclusion algorithm variant used for the reference shape selection. In the attempt to cover the entire dataset shape space with a minimum number of reference shapes, the algorithm tends to leave 'holes' in the shape space, thus producing unequally sampled regions. Given the substantial redundancy of conformer shapes in the multi-conformer model dataset, it is very likely that a large fraction of the additional 3 076 reference shapes is necessary to fill these holes. There is no direct indication that the additional reference shapes resulted from anything more than sampling deficiency, i.e., were not directly attributable to the size or flexibility of molecules. Use of more efficient sampling algorithms designed to avoid empty spaces, e.g., DISE , may lead to more efficient shape space coverage than that used in this study.
Because we aim at selecting a diverse set of shapes, the reference conformers appear to represent particular structural features to a greater extent than are present in the entire dataset. For example, only 20% of the PubChem dataset contain chiral centers; however, 33% of the reference shapes contain a chiral center. Similarly, a (non-exhaustive) trend is found between the dataset and reference conformers for triple bonds (8% versus 37%), lack of aromatic atoms (6% versus 21%), and presence of a ring system with more than six atoms (3% versus 28%). As a consequence, reference shapes generated from structures with less common features tend to cluster fewer database conformers than those coming from compounds with more common features.
Each time an alignment is attempted after transformation matrices combination, the quality of the alignment is evaluated by a single point shape Tanimoto estimation via a Gaussian Grid approximation similar to ROCS, as detailed in the Methods section. The final number of matrix multiplications and alignments depends on the Transform-Tanimoto value as well as the number of reference shapes in the vicinity of the query conformer.
In practice, a single combination of transformation matrices cannot guarantee a result close to an optimal structural alignment. Some conformers may have different optimal alignments with a reference shape due to structural symmetry; however, the presence of multiple reference shapes greatly increases the chance of finding an alignment very close to the analytical maximum overlap solution. A convenient property of the method is that similar structures tend to have more reference shapes in common than dissimilar ones, thus far more CPU time is dedicated to the alignment of similar structures than for dissimilar structures.
Finding the right Transform-Tanimoto
Similarity searches often require a threshold as a simple criterion to prune the hit list. The threshold value is somewhat subjective although a reasonable range of useful values can be deduced from the literature involving ROCS. Rush et al.  mention a general rule-of-thumb that a shape Tanimoto value greater than 0.75 provides visual shape similarity, although they used a 0.85 threshold to select their ZipA-FtsZ protein-protein inhibitors. According to a regression plot from Bostrom et al.  and our own in-house experience, a RMSD cut-off of 1.0 Å used during conformational sampling with OMEGA 2.0 Beta roughly corresponds to a shape Tanimoto between 0.75 and 0.85. In their virtual screening study, Muchmore et al.  found a melanin-concentrating hormone receptor 1 antagonist with nanomolar IC50 at a shape Tanimoto above 0.80. Taking these studies into account, our range of interest in finding similar shapes is limited to ROCS shape Tanimoto between 0.75 and 1.0, alignments with lower similarity values were not considered for this work.
The suitable Transform-Tanimoto value, which determines if two structures have a reference shape in common and enables alignment via matrix multiplication, was determined empirically. For that, we performed a set of random overlays using both ROCS and alignment-recycling. Our objective was to keep the Transform-Tanimoto value as high as possible to limit the possible number of matrix combinations and, in doing so, save substantially on CPU time. We started by setting the Transform-Tanimoto value to the Design-Tanimoto value, i.e., 0.75. When applying the AR technique, alignment cases where two conformers do not share a common reference shape are assigned a shape Tanimoto value of zero. Because the initial Transform-Tanimoto threshold was not providing the quantity of hits to be consistent with the brute-force approach, primarily due to not finding appropriate reference shapes in common, we progressively decreased the Transform-Tanimoto value by 0.01.
The distribution in Figure 7 also indicates that about 1% of the time alignment-recycling performs relevantly better (shape Tanimoto difference > 0.01) than ROCS. One possible explanation for this observation is that ROCS gets locked into a local minimum during overlap optimization. A more likely explanation is differences in the numerical precision of the ROCS Grid method versus ours (see section Gaussian shape overlay). Overall, the chance of getting a poor AR alignment, as compared to one produced by ROCS, is relatively rare, when using a 0.73 Transform-Tanimoto value and considering the full 0.75–1.0 shape Tanimoto range.
Comparing speed and hit lists
CPU time to query the fifteen million conformer database with a single conformer
CPU Time (min.)
AR-0.73 (Total) a
AR-0.73 (Screening part)
AR-0.74 (Screening part)
AR-0.75 (Screening part)
Query vs. 5 534 AR Reference Shapes
This increase in throughput is not surprising. For each conformer, we are only ever optimizing the overlay to the query for the 5 534 reference shapes. Also, the conformer database reference shape fingerprints are quite sparse, having only 1, 40, or 141 reference shapes set at minimum, average, or maximum, respectively. In contrast, ROCS requires optimizing the overlay of the query conformer to all fifteen million database conformers. As a means of comparison, CPU times required to search the dataset at Transform-Tanimoto values equal to 0.74 and 0.75 are also shown in Table 1. These timings indicate a two- and four-fold decrease, respectively, directly related to a substantial decline in the number of reference shapes considered during screening. This also suggests that each additional 0.01 decrease in the Transform-Tanimoto will increase the AR method CPU requirement by a factor of two.
Average compound hit list size resulting from querying the fifteen million conformer database with a single conformer
Shape Tanimoto Threshold
Figure 10b shows the ability of AR-0.73 to reproduce a growing percentage of the ROCS compound hit list as the AR-0.73 screening threshold is decreased, while keeping the ROCS screening threshold constant. As Figure 10c shows, however, that simply decreasing the AR-0.73 screening threshold only improves the union of the two compound hit lists to a point, after which diminishing returns sets in and the hit list overlap becomes worse. This result is expected considering decreasing the AR-0.73 screening threshold results in both ROCS hits missed by AR-0.73 and AR-0.73 hits that would be found by ROCS, if the ROCS screening threshold was not kept constant.
Comparison between ROCS and AR-0.73 hit lists when using reduced similarity thresholds for AR-0.73
Compound hit list category
Hits found by both methods
Hits missed by AR-0.73
No common reference
With common reference
Hits missed by ROCS
Threshold related a
The observed correction for maximum overlap of AR-0.73 and ROCS hit lists as a function of shape Tanimoto appears to be linear. If this relationship holds across the entire range of ROCS shape Tanimoto values of 0.75 to 1.0, one could employ Eq. 3 to select the appropriate AR-0.73 shape Tanimoto cut-off to use for a desired ROCS shape Tanimoto value to achieve maximum overlap of results.
STAR-0.73 = 1.04 * ST ROCS - 0.04
where STAR-0.73 is the suggested optimum AR-0.73 shape Tanimoto value to use for a corresponding shape Tanimoto value, ST ROCS , in the range of 0.75 to 1.0.
The AR-0.73 method consistently reproduces ROCS results emphasizing that conformers with similar shapes tend to overlay to each other in a similar way. As such, overlay of two conformers, A and B, to a reference conformer, R, may generate an excellent approximation to the ideal alignment of conformers A and B by simply (re)using the alignments AR and BR. After finding a suitable set of reference shapes, the CPU cost to search for similar conformers across datasets of millions can be dramatically reduced. While efficient, the alignment-recycling method, AR-0.73, outlined in this work does have its limitations.
AR-0.73, in its current form, cannot be used for sub-shape comparison since global alignments are used. One can, however, readily imagine a subshape-based 3D fingerprint, much like dictionary-based 2D fingerprints. The implementation of such a method is beyond the scope of this work.
If a similar (enough) reference shape is not present when comparing two conformers, poor shape alignments may result, causing hits to be found by ROCS but missed by AR-0.73. If no reference shape is found to be in common, AR-0.73 cannot produce an alignment.
As the molecular size and flexibility increase, the number of required reference shapes is likely to increase dramatically to generate accurate shape overlays, which is probably the most important drawback of the AR-0.73 method. Reductions in the Design-Tanimoto can counter large increases in the number of reference shapes; however, in our experience, such a reduction in the Design-Tanimoto threshold, and concomitant reduction of the Transform-Tanimoto, can result in a reduction in the average quality of reproduction of the optimal overlay and an increased computational cost due to the consideration of additional conformers in alignment-recycling portion of the method. The overlay quality can be dramatically improved, in this situation, by slightly altering the methodology provided in this work to perform a post overlay optimization, using the near-optimal alignment-recycling overlay as a starting point for shape overlay optimization, providing substantial computational savings in the absence of such information. This proposed methodology extension may provide the means to apply aspects of the alignment-recycling method to larger and more flexible small molecules by eliminating the requirement that the recycled alignment reproduce the optimal alignment, thus allowing the Design-Tanimoto and Transform-Tanimoto thresholds to be (substantially) reduced.
Another drawback to AR-0.73 is that the primary computational expense is borne before any shape similarity searches are performed. For the fifteen million conformers used in this study, it took about four CPU years to compute the shape fingerprints using 64-bit 3-GHz Intel dual-core Xeon processors. Computational cost of the fingerprint generation is essentially recovered, however, after performing the same number of searches as there are reference shapes.
AR-0.73, while substantially reducing the CPU cost of shape similarity searching, adds concomitant demands on storing alignments to the reference shapes that must be available during the search. For the fifteen million conformer dataset, the (non-optimized) storage requirement for the fingerprints and rotational/translational information is 32 GB. If one is not careful, simultaneous access to this data can be a significant bottleneck.
If the AR-0.73 method is used with a dynamic database of conformers, additional computational costs can be envisioned. As new conformers are added, new reference shapes must be added dynamically whenever existing reference shapes cannot represent a new conformer. Addition of a new reference shape will require the precomputation step of comparing all existing database conformers to the new reference shape. After many new reference shapes are added (> 50% more of the initial total), a complete re-sampling of the reference shapes may be warranted to improve overall search performance through a reduction in the number of reference shapes. Also, for efficiency purposes, as conformers are deleted from the database, care must be taken to ignore reference shapes that no longer represent any database conformer to prevent unnecessary comparisons to a redundant reference shape.
With the above caveats in mind, the AR-0.73 method as described should be useful to speed the search of any 3D conformer dataset, regardless of size or flexibility. There should be no need to further modify the Transform-Tanimoto and Design-Tanimoto values of 0.73 and 0.75, respectively, to provide, e.g., complementary results to a ROCS search in the shape Tanimoto range of 0.75 – 1.0. The diverse reference shapes used in this work (see Additional files1 and 2) should be useful in helping create the initial reference shapes required to implement this method for arbitrary conformer databases. It is also reasonable to believe that the spirit of this methodology could be made to work using other shape searching packages besides ROCS.
Alterations to the AR-0.73 parameters, Transform-Tanimoto and Design-Tanimoto, may be made depending on the desired purpose. If one was only interested in use of this methodology as a shape search screen to dramatically reduce the number of conformers considered prior to shape overlay optimization and to provide reasonable starting points for overlay optimization, reduced values of the two parameters could be used, resulting in substantially fewer reference shapes and a significant reduction in the pre-computation cost. If one was only interested in reproduction of hit lists with shape Tanimoto values of 0.90 or greater, the Transform-Tanimoto could be increased closer to the Design-Tanimoto values, providing a further speed up in the shape search speed by reducing the number of conformers considered by alignment-recycling.
Overall, it appears clear that the AR-0.73 method, while an approximation to the optimal shape overlay, is very capable at routinely producing the vast majority of the ROCS results in a fraction of the CPU time.
One of the main advantages of 3D overlay is that it allows visualization of the superimposed compounds and a better understanding of their similarity. Unfortunately, at the scale of large databases containing millions or billions of conformers, 3D alignment-based similarity searches are reserved to only entities with substantial computing capabilities and modeling resources. Even for such entities, it would be a major breakthrough to get nearly all of the desired alignments in just a couple of minutes using only a single CPU node. The alignment-recycling method described in this work shows promise in dramatically improving the speed of shape similarity searches of large databases through pre-computation of a small subset of shape overlays. Although the pre-computation requires significant computing resources, it is within the reach of modern, yet modest, computer clusters. The pre-computed transformation matrices to obtain the alignments with the subset can be effectively recombined to generate new alignments. Hit lists comparable to the Gaussian shape overlay optimizer ROCS can be obtained 100-times faster with only a small loss in alignment quality for smaller and relatively inflexible molecules. Suggested extensions and modifications to this methodology may prove handy in making 3D similarity a more tractable tool for use on large conformer databases.
The subset of PubChem used for the analysis was extracted using the following protocol:
Extract all the live records from the PubChem Compound  database
Split mixtures into single covalent units
Remove each structure not compliant with MMFF94s as implemented in OMEGA 
Neutralize each ionic structure using a hydrogen atom, if chemically sensible
Remove duplicate structures by comparing CACTVS stereo hash codes 
Remove structures with incomplete stereochemistry (i.e., cis/trans double bonds or R/S stereo centers that are undefined)
Remove structures with more than twenty-seven heavy atoms and more than five rotatable bonds
Build the single conformer dataset using OMEGA 1.8.1 
Build the multiple conformer ensemble using OMEGA 2.0 Beta  and RMSD 1.0 Å spacing
Gaussian shape overlay
The volume of a molecule is generally represented as the finite union of overlapping spheres, each one representing an atom. Although the most intuitive, the hard-sphere model involves complicated analytical expressions and gradient discontinuities. Grant and Pickup  overcame these problems by replacing the hard-sphere density function by a soft-sphere Gaussian equivalent, allowing rapid computation of molecular volumes. The smoothness of the Gaussian function and the simplicity of its derivatives greatly facilitate shape overlay optimizations . Grant and Pickup algorithms are currently implemented in the OpenEye OEShape C++ toolkit . The ROCS application is built using this toolkit. When we refer to ROCS, we are actually referring to the OEShape toolkit.
ROCS provides multiple conformer overlap determination methods. The Grid method is faster when many conformers are fit on a single reference conformer, but it treats all the heavy atoms as carbon, loosing overlap quality in some cases. For initial shape space coverage, we found the Analytic overlap method provided the best trade-off between the speed of the Grid overlap method and the precision of the Exact overlap method. For the alignment-recycling versus ROCS comparison we used the default ROCS Grid approach, as it is the fastest.
In this study, the atom radii used are Delphi radii, available from the OpenEye OEChem C++ library , and only non-hydrogen atoms are considered during shape comparisons. The shape similarity measure used is the Gaussian shape Tanimoto depicted in Eq. 2.
Alignment-recycling evaluates alignment-quality after each matrix multiplication through a single point shape Tanimoto computation. We used our own implementation of the ROCS Grid method. The results from our method are in essence identical to the results produced by ROCS (R2 = 0.9998, SD = 0.00073, with N = 9 401 620 and maximum difference = 0.012).
Diverse reference shape selection
The methodology for reference shape selection has been explained in great detail by Haigh et al. . The dataset of conformers are clustered using a simple sphere exclusion algorithm. In the first step, a starting conformer is randomly selected as a reference shape. In the second step, all the conformers with a shape Tanimoto to the current reference shape greater than a pre-defined cut-off value (i.e., the "Design-Tanimoto" value) are assigned to the current reference shape cluster. For all the unassigned conformers, the shape Tanimoto to the most similar reference shape is stored. In the third step, the one conformer with the lowest stored similarity is selected as a new reference shape. The second and third steps are repeated until all conformers are assigned to a reference shape cluster. The Design-Tanimoto defines the resolution of coverage of the "shape space" of the dataset. The structure of the reference shapes is available in supporting information.
The test set from Oprea et al.  was extracted from the SD File available in the supporting information. Only 65 compounds met the PubChem subset selection criteria, e.g., for size and flexibility. We generated a 3D conformer model for each compound using OMEGA 2.0 Beta  and a RMSD cut-off of 1.0 Å. A single conformer was selected at random for each compound to perform the benchmark speed comparison. The conformer structure coordinates are available in supporting information. CPU time comparisons were performed using 64-bit 3-GHz Intel dual-core Xeon processors on the SuSE Enterprise 9.3 platform.
The authors are thankful to Anthony Nicholls for constructive comments, Wolf-D. Ihlenfeldt who helped write CACTVS scripts, and OpenEye Scientific Software for intuitive insights and useful 3D tools. This research was supported by the Intramural Research Program of the National Institutes of Health, National Library of Medicine.
- Sheridan RP, Kearsley SK: Why do we need so many chemical similarity search methods?. Drug Discovery Today. 2002, 7 (17): 903-10.1016/S1359-6446(02)02411-X.View ArticleGoogle Scholar
- Nikolova N, Jaworska J: Approaches to Measure Chemical Similarity ± a Review. QSAR Comb Sci. 2003, 22: 1006-1026. 10.1002/qsar.200330831.View ArticleGoogle Scholar
- Barbosa F, Horvath D: Molecular similarity and property similarity. Curr Top Med Chem. 2004, 4 (6): 589-600. 10.2174/1568026043451186.View ArticleGoogle Scholar
- Bender A, Glen RC: Molecular similarity: a key technique in molecular informatics. Org Biomol Chem. 2004, 2 (22): 3204-3218. 10.1039/b409813g.View ArticleGoogle Scholar
- Good AC, Hermsmeier MA, Hindle SA: Measuring CAMD technique performance: a virtual screening case study in the design of validation experiments. J Comput Aided Mol Des. 2004, 18 (7-9): 529-536. 10.1007/s10822-004-4067-1.View ArticleGoogle Scholar
- Takahashi Y, Sukekawa M, Sasaki S: Automatic Identification of Molecular Similarity Using Reduced-Graph Representation of Chemical Structure. J Chem Inf Comput Sci. 1992, 32: 639-643. 10.1021/ci00010a009.View ArticleGoogle Scholar
- Rarey M, Dixon JS: Feature trees: a new molecular similarity measure based on tree matching. J Comput Aided Mol Des. 1998, 12 (5): 471-490. 10.1023/A:1008068904628.View ArticleGoogle Scholar
- Barker EJ, Buttar D, Cosgrove DA, Gardiner EJ, Kitts P, Willett P, Gillet VJ: Scaffold hopping using clique detection applied to reduced graphs. J Chem Inf Model. 2006, 46 (2): 503-511. 10.1021/ci050347r.View ArticleGoogle Scholar
- ROCS - Rapid Overlay of Chemical Structures. 2006, OpenEye Scientific Software, Inc., [http://www.eyesopen.com]2.2
- Grant JA, Gallardo MA, Pickup BT: A fast method of molecular shape comparison: A simple application of a Gaussian description of molecular shape. J Comput Chem. 1996, 17 (14): 1653-1666. 10.1002/(SICI)1096-987X(19961115)17:14<1653::AID-JCC7>3.0.CO;2-K.View ArticleGoogle Scholar
- Rush TS, Grant JA, Mosyak L, Nicholls A: A shape-based 3-D scaffold hopping method and its application to a bacterial protein-protein interaction. J Med Chem. 2005, 48 (5): 1489-1495. 10.1021/jm040163o.View ArticleGoogle Scholar
- Gundersen E, Fan K, Haas K, Huryn D, Steven Jacobsen J, Kreft A, Martone R, Mayer S, Sonnenberg-Reines J, Sun SC, Zhou H: Molecular-modeling based design, synthesis, and activity of substituted piperidines as [gamma]-secretase inhibitors. Bioorganic & Medicinal Chemistry Letters. 2005, 15 (7): 1891-10.1016/j.bmcl.2005.02.006.View ArticleGoogle Scholar
- Muchmore SW, Souers AJ, Akritopoulou-Zanze I: The Use of Three-Dimensional Shape and Electrostatic Similarity Searching in the Identification of a Melanin-Concentrating Hormone Receptor 1 Antagonist. Chemical Biology & Drug Design. 2006, 67 (2): 174-176. 10.1111/j.1747-0285.2006.00341.x.View ArticleGoogle Scholar
- Ghuloum AM, Sage CR, Jain AN: Molecular Hashkeys: A Novel Method for Molecular Characterization and Its Application for Predicting Important Pharmaceutical Properties of Molecules. J Med Chem. 1999, 42 (10): 1739-1748. 10.1021/jm980527a.View ArticleGoogle Scholar
- Haigh JA, Pickup BT, Grant JA, Nicholls A: Small Molecule Shape-Fingerprints. J Chem Inf Model. 2005, 45 (3): 673-684. 10.1021/ci049651v.View ArticleGoogle Scholar
- Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, Federhen S, Geer LY, Helmberg W, Kapustin Y, Kenton DL, Khovayko O, Lipman DJ, Madden TL, Maglott DR, Ostell J, Pruitt KD, Schuler GD, Schriml LM, Sequeira E, Sherry ST, Sirotkin K, Souvorov A, Starchenko G, Suzek TO, Tatusov R, Tatusova TA, Wagner L, Yaschenko E: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2006, 34 (Database issue): D173-80. 10.1093/nar/gkj158.View ArticleGoogle Scholar
- Oprea TI, Davis AM, Teague SJ, Leeson PD: Is there a difference between leads and drugs? A historical perspective. J Chem Inf Comput Sci. 2001, 41 (5): 1308-1315. 10.1021/ci010366a.View ArticleGoogle Scholar
- Lipinski CA, Lombardo F, Dominy BW, Feeney PJ: Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Advanced Drug Delivery Reviews. 1997, 23 (1-3): 3-10.1016/S0169-409X(96)00423-1.View ArticleGoogle Scholar
- Halgren TA: MMFF VI. MMFF94s option for energy minimization studies. J Comput Chem. 1999, 20 (7): 720-729. 10.1002/(SICI)1096-987X(199905)20:7<720::AID-JCC7>3.0.CO;2-X.View ArticleGoogle Scholar
- OMEGA. 2006, OpenEye Scientific Software, Inc., [http://www.eyesopen.com]
- Gobbi A, Lee ML: DISE: directed sphere exclusion. J Chem Inf Comput Sci. 2003, 43 (1): 317-323. 10.1021/ci025554v.View ArticleGoogle Scholar
- Bostrom J, Greenwood JR, Gottfries J: Assessing the performance of OMEGA with respect to retrieving bioactive conformations. J Mol Graph Model. 2003, 21 (5): 449-462. 10.1016/S1093-3263(02)00204-8.View ArticleGoogle Scholar
- Ihlenfeldt WD, Gasteiger J: Hash Codes For The Identification And Classification Of Molecular-Structure Elements. J Comput Chem. 1994, 15 (8): 793-813. 10.1002/jcc.540150802.View ArticleGoogle Scholar
- Grant JA, Pickup BT: A Gaussian Description of Molecular Shape. J Phys Chem. 1995, 99 (11): 3503-3510. 10.1021/j100011a016.View ArticleGoogle Scholar
- OEShape - C++ . 2006, Santa Fe , OpenEye Scientific Software, Inc., 1.6.1Google Scholar
- OEChem - C++ . 2006, Santa Fe , OpenEye Scientific Software, Inc., 1.4.2Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.