Fast follow-up SAR diverse screening library
Fast follow-up SAR diverse screening library
- 20 000 clusters 155 000 compounds
- Compounds per cluster – 5-10+ MedChem consistent SAR ready sets
- Inter cluster similarity < 0.2 (diverse)
- Intra cluster similarity 0.5-0.85 (reasonably similar)
The preparation algorithm:
1. Apply REOS, MedChem & PAINS filters to remove reactive (e.g. covalent inhibitors), toxic, promiscuous, and other undesirable structural motifs to the entire 1.6M ChemDiv inventory
2. Then apply Physico-Chemical Properties filters (Lipinski Ro5) to remove non-druglike molecules
3. For the remaining selection, apply the in house developed sphere exclusion (“Batch Butina”) algorithm to simultaneously cluster and pick the most diverse compounds
4. Make final postprocessing to ensure:
a. Only clusters with at least 5 representative molecules are present – only SAR ready sets are preserved (excluding e.g. singletons)
b. The intercluster similarity is no more than 0.7 (Tanimoto, ECFP4 2048) – the cluster diversity is ensured
c. No compounds similar more than 0.85 (Tanimoto, ECFP4 2048) are present within each cluster – establish more meaningful SAR (not too trivial substitutions)
The library properties:
1. In total 155691 Ro5 compatible compounds, not containing PAINS and REOS
2. Contains 21468 compound clusters of no fewer than 5 compounds
3. Most clusters are based on the same scaffold, but certain clusters reveal certain scaffold
diversity, while preserving FP similarity – should foster reliable and fast SAR establishment