oddt package

Submodules

oddt.datasets module

Datasets wrapped in convenient models

class oddt.datasets.CASF(home)[source]

Bases: object

Load CASF dataset as described in Li, Y. et al. Comparative Assessment of Scoring Functions on an Updated Benchmark: 2. Evaluation Methods and General Results. J. Chem. Inf. Model. 54, 1717-1736. (2014) http://dx.doi.org/10.1021/ci500081m

Parameters
home: string

Path to CASF dataset main directory

Methods

precomputed_score([scoring_function])

Load precomputed results of scoring power test for various scoring functions.

precomputed_screening([scoring_function, …])

Load precomputed results of screening power test for various scoring functions

precomputed_score(scoring_function=None)[source]

Load precomputed results of scoring power test for various scoring functions.

Parameters
scoring_function: string (default=None)

Name of the scoring function to get results If None, all results are returned.

precomputed_screening(scoring_function=None, cluster_id=None)[source]

Load precomputed results of screening power test for various scoring functions

Parameters
scoring_function: string (default=None)

Name of the scoring function to get results If None, all results are returned

cluster_id: int (default=None)

Number of the protein cluster to get results If None, all results are returned

class oddt.datasets.dude(home)[source]

Bases: object

A wrapper for DUD-E (A Database of Useful Decoys: Enhanced) http://dude.docking.org/

Parameters
homestr

Path to files from dud-e

class oddt.datasets.pdbbind(home, version=None, default_set=None, opt=None)[source]

Bases: object

Attributes
activities
ids
property activities
property ids

oddt.fingerprints module

Module checks interactions between two molecules and creates interacion fingerprints.

oddt.fingerprints.ECFP(mol, depth=2, size=4096, count_bits=True, sparse=True, use_pharm_features=False)[source]

Extended connectivity fingerprints (ECFP) with an option to include atom features (FCPF). Depth of a fingerprint is counted as bond-steps, thus the depth for ECFP2 = 1, ECPF4 = 2, ECFP6 = 3, etc.

Reference: Rogers D, Hahn M. Extended-connectivity fingerprints. J Chem Inf Model. 2010;50: 742-754. http://dx.doi.org/10.1021/ci100050t

Parameters
moloddt.toolkit.Molecule object

Input molecule for the FP calculations

depthint (deafult = 2)

The depth of the fingerprint, i.e. the number of bonds in Morgan algorithm. Note: For ECFP2: depth = 1, ECFP4: depth = 2, etc.

sizeint (default = 4096)

Final size of fingerprint to which it is folded.

count_bitsbool (default = True)

Should the bits be counted or unique. In dense representation it translates to integer array (count_bits=True) or boolean array if False.

sparsebool (default=True)

Should fingerprints be dense (contain all bits) or sparse (just the on bits).

use_pharm_featuresbool (default=False)

Switch to use pharmacophoric features as atom representation instead of explicit atomic numbers etc.

Returns
fingerprintnumpy array

Calsulated FP of fixed size (dense) or on bits indices (sparse). Dtype is either integer or boolean.

oddt.fingerprints.InteractionFingerprint(ligand, protein, strict=True)[source]

Interaction fingerprint accomplished by converting the molecular interaction of ligand-protein into bit array according to the residue of choice and the interaction. For every residue (One row = one residue) there are eight bits which represent eight type of interactions:

  • (Column 0) hydrophobic contacts

  • (Column 1) aromatic face to face

  • (Column 2) aromatic edge to face

  • (Column 3) hydrogen bond (protein as hydrogen bond donor)

  • (Column 4) hydrogen bond (protein as hydrogen bond acceptor)

  • (Column 5) salt bridges (protein positively charged)

  • (Column 6) salt bridges (protein negatively charged)

  • (Column 7) salt bridges (ionic bond with metal ion)

Parameters
ligand, proteinoddt.toolkit.Molecule object

Molecules, which are analysed in order to find interactions.

strictbool (deafult = True)

If False, do not include condition, which informs whether atoms form ‘strict’ H-bond (pass all angular cutoffs).

Returns
InteractionFingerprintnumpy array

Vector of calculated IFP (size = no residues * 8 type of interaction)

oddt.fingerprints.PLEC(ligand, protein, depth_ligand=2, depth_protein=4, distance_cutoff=4.5, size=16384, count_bits=True, sparse=True, ignore_hoh=True, bits_info=None)[source]

Protein ligand extended connectivity fingerprint. For every pair of atoms in contact, compute ECFP and then hash every single, corresponding depth.

Parameters
ligand, proteinoddt.toolkit.Molecule object

Molecules, which are analysed in order to find interactions.

depth_ligand, depth_proteinint (deafult = (2, 4))

The depth of the fingerprint, i.e. the number of bonds in Morgan algorithm. Note: For ECFP2: depth = 1, ECFP4: depth = 2, etc.

size: int (default = 16384)

SPLIF is folded to given size.

distance_cutoff: float (default=4.5)

Cutoff distance for close contacts.

sparsebool (default = True)

Should fingerprints be dense (contain all bits) or sparse (just the on bits).

count_bitsbool (default = True)

Should the bits be counted or unique. In dense representation it translates to integer array (count_bits=True) or boolean array if False.

ignore_hohbool (default = True)

Should the water molecules be ignored. This is based on the name of the residue (‘HOH’).

bits_infodict or None (default = None)

If dictionary is provided it is filled with information about bit contents. Root atom index and depth is provided for both ligand and protein. Dictionary is modified in-place.

Returns
PLECnumpy array

fp (size = atoms in contacts * max(depth_protein, depth_ligand))

oddt.fingerprints.SPLIF(ligand, protein, depth=1, size=4096, distance_cutoff=4.5)[source]

Calculates structural protein-ligand interaction fingerprint (SPLIF), based on http://pubs.acs.org/doi/abs/10.1021/ci500319f.

Parameters
ligand, proteinoddt.toolkit.Molecule object

Molecules, which are analysed in order to find interactions.

depthint (deafult = 1)

The depth of the fingerprint, i.e. the number of bonds in Morgan algorithm. Note: For ECFP2: depth = 1, ECFP4: depth = 2, etc.

size: int (default = 4096)

SPLIF is folded to given size.

distance_cutoff: float (default=4.5)

Cutoff distance for close contacts.

Returns
SPLIFnumpy array

Calculated SPLIF.shape = (no. of atoms, ). Every row consists of three elements:

row[0] = index of hashed atoms row[1].shape = (7, 3) -> ligand’s atom coords and 6 his neigbor’s row[2].shape = (7, 3) -> protein’s atom coords and 6 his neigbor’s

oddt.fingerprints.SimpleInteractionFingerprint(ligand, protein, strict=True)[source]

Based on http://dx.doi.org/10.1016/j.csbj.2014.05.004. Every IFP consists of 8 bits per amino acid (One row = one amino acid) and present eight type of interaction:

  • (Column 0) hydrophobic contacts

  • (Column 1) aromatic face to face

  • (Column 2) aromatic edge to face

  • (Column 3) hydrogen bond (protein as hydrogen bond donor)

  • (Column 4) hydrogen bond (protein as hydrogen bond acceptor)

  • (Column 5) salt bridges (protein positively charged)

  • (Column 6) salt bridges (protein negatively charged)

  • (Column 7) salt bridges (ionic bond with metal ion)

Returns matrix, which is sorted according to this pattern : ‘ALA’, ‘ARG’, ‘ASN’, ‘ASP’, ‘CYS’, ‘GLN’, ‘GLU’, ‘GLY’, ‘HIS’, ‘ILE’, ‘LEU’, ‘LYS’, ‘MET’, ‘PHE’, ‘PRO’, ‘SER’, ‘THR’, ‘TRP’, ‘TYR’, ‘VAL’, ‘’. The ‘’ means cofactor. Index of amino acid in pattern coresponds to row in returned matrix.

Parameters
ligand, proteinoddt.toolkit.Molecule object

Molecules, which are analysed in order to find interactions.

strictbool (deafult = True)

If False, do not include condition, which informs whether atoms form ‘strict’ H-bond (pass all angular cutoffs).

Returns
InteractionFingerprintnumpy array

Vector of calculated IFP (size = 168)

oddt.fingerprints.dice(a, b, sparse=False)[source]

Calculates the Dice coefficient, the ratio of the bits in common to the arithmetic mean of the number of ‘on’ bits in the two fingerprints. Supports integer and boolean fingerprints.

Parameters
a, bnumpy array

Interaction fingerprints, which are compared in order to determine similarity.

sparsebool (default=False)

Type of FPs to use. Defaults to dense form.

Returns
scorefloat

Similarity between a, b.

oddt.fingerprints.similarity_SPLIF(reference, query, rmsd_cutoff=1.0)[source]

Calculates similarity between structural interaction fingerprints, based on doi:http://pubs.acs.org/doi/abs/10.1021/ci500319f.

Parameters
reference, query: numpy.array

SPLIFs, which are compared in order to determine similarity.

rmsd_cutoffint (default = 1)

Specific treshold for which, bits are considered as fully matching.

Returns
SimilarityScorefloat

Similarity between given fingerprints.

oddt.fingerprints.tanimoto(a, b, sparse=False)[source]

Tanimoto coefficient, supports boolean fingerprints. Integer fingerprints are casted to boolean.

Parameters
a, bnumpy array

Interaction fingerprints, which are compared in order to determine similarity.

sparsebool (default=False)

Type of FPs to use. Defaults to dense form.

Returns
scorefloat

Similarity between a, b.

oddt.interactions module

Module calculates interactions between two molecules (proein-protein, protein-ligand, small-small). Currently following interacions are implemented:

  • hydrogen bonds

  • halogen bonds

  • pi stacking (parallel and perpendicular)

  • salt bridges

  • hydrophobic contacts

  • pi-cation

  • metal coordination

  • pi-metal

oddt.interactions.acceptor_metal(mol1, mol2, tolerance=30, cutoff=4)[source]

Returns pairs of acceptor-metal atoms, which meet metal coordination criteria Note: This function is directional (mol1 holds acceptors, mol2 holds metals)

Parameters
mol1, mol2oddt.toolkit.Molecule object

Molecules to compute acceptor and metal pairs

cutofffloat, (default=4)

Distance cutoff for A-M pairs

toleranceint, (default=30)

Range (+/- tolerance) from perfect direction defined by atoms hybridization in metal coordination are considered as strict.

Returns
a, datom_dict-type numpy array

Aligned arrays of atoms forming metal coordination, firstly acceptors, secondly metals.

strictnumpy array, dtype=bool

Boolean array align with atom pairs, informing whether atoms form ‘strict’ metal coordination (pass all angular cutoffs). If false, only distance cutoff is met, therefore the interaction is ‘crude’.

oddt.interactions.close_contacts(x, y, cutoff, x_column='coords', y_column='coords', cutoff_low=0.0)[source]

Returns pairs of atoms which are within close contac distance cutoff. The cutoff is semi-inclusive, i.e (cutoff_low, cutoff].

Parameters
x, yatom_dict-type numpy array

Atom dictionaries generated by oddt.toolkit.Molecule objects.

cutofffloat

Cutoff distance for close contacts

x_column, ycolumnstring, (default=’coords’)

Column containing coordinates of atoms (or pseudo-atoms, i.e. ring centroids)

cutoff_lowfloat (default=0.)

Lower bound of contacts to find (exclusive). Zero by default. .. versionadded:: 0.6

Returns
x_, y_atom_dict-type numpy array

Aligned pairs of atoms in close contact for further processing.

oddt.interactions.halogenbond_acceptor_halogen(mol1, mol2, tolerance=30, cutoff=4)[source]

Returns pairs of acceptor-halogen atoms, which meet halogen bond criteria

Parameters
mol1, mol2oddt.toolkit.Molecule object

Molecules to compute halogen bond acceptor and halogen pairs

cutofffloat, (default=4)

Distance cutoff for A-H pairs

toleranceint, (default=30)

Range (+/- tolerance) from perfect direction defined by atoms hybridization in which halogen bonds are considered as strict.

Returns
a, hatom_dict-type numpy array

Aligned arrays of atoms forming halogen bond, firstly acceptors, secondly halogens

strictnumpy array, dtype=bool

Boolean array align with atom pairs, informing whether atoms form ‘strict’ halogen bond (pass all angular cutoffs). If false, only distance cutoff is met, therefore the bond is ‘crude’.

oddt.interactions.halogenbonds(mol1, mol2, cutoff=4, tolerance=30)[source]

Calculates halogen bonds between molecules

Parameters
mol1, mol2oddt.toolkit.Molecule object

Molecules to compute halogen bond acceptor and halogen pairs

cutofffloat, (default=4)

Distance cutoff for A-H pairs

toleranceint, (default=30)

Range (+/- tolerance) from perfect direction defined by atoms hybridization in which halogen bonds are considered as strict.

Returns
mol1_atoms, mol2_atomsatom_dict-type numpy array

Aligned arrays of atoms forming halogen bond

strictnumpy array, dtype=bool

Boolean array align with atom pairs, informing whether atoms form ‘strict’ halogen bond (pass all angular cutoffs). If false, only distance cutoff is met, therefore the bond is ‘crude’.

oddt.interactions.hbond_acceptor_donor(mol1, mol2, cutoff=3.5, tolerance=30, donor_exact=False)[source]

Returns pairs of acceptor-donor atoms, which meet H-bond criteria

Parameters
mol1, mol2oddt.toolkit.Molecule object

Molecules to compute H-bond acceptor and H-bond donor pairs

cutofffloat, (default=3.5)

Distance cutoff for A-D pairs

toleranceint, (default=30)

Range (+/- tolerance) from perfect direction defined by acceptor/donor hybridization in which H-bonds are considered as strict.

donor_exactbool

Use exact protonation states for donors, i.e. require Hs on donor. By default ODDT implies some tautomeric structures as protonated, even if there is no H on specific atom.

Returns
a, datom_dict-type numpy array

Aligned arrays of atoms forming H-bond, firstly acceptors, secondly donors.

strictnumpy array, dtype=bool

Boolean array align with atom pairs, informing whether atoms form ‘strict’ H-bond (pass all angular cutoffs). If false, only distance cutoff is met, therefore the bond is ‘crude’.

oddt.interactions.hbonds(mol1, mol2, cutoff=3.5, tolerance=30, mol1_exact=False, mol2_exact=False)[source]

Calculates H-bonds between molecules

Parameters
mol1, mol2oddt.toolkit.Molecule object

Molecules to compute H-bond acceptor and H-bond donor pairs

cutofffloat, (default=3.5)

Distance cutoff for A-D pairs

toleranceint, (default=30)

Range (+/- tolerance) from perfect direction defined by atoms hybridization in which H-bonds are considered as strict.

mol1_exact, mol2_exactbool

If set to true, exact protonations states are used for respective molecules, i.e. require Hs. By default ODDT implies some tautomeric structures as protonated, even if there is no H on specific atom.

Returns
mol1_atoms, mol2_atomsatom_dict-type numpy array

Aligned arrays of atoms forming H-bond

strictnumpy array, dtype=bool

Boolean array align with atom pairs, informing whether atoms form ‘strict’ H-bond (pass all angular cutoffs). If false, only distance cutoff is met, therefore the bond is ‘crude’.

oddt.interactions.hydrophobic_contacts(mol1, mol2, cutoff=4)[source]

Calculates hydrophobic contacts between molecules

Parameters
mol1, mol2oddt.toolkit.Molecule object

Molecules to compute hydrophobe pairs

cutofffloat, (default=4)

Distance cutoff for hydrophobe pairs

Returns
mol1_atoms, mol2_atomsatom_dict-type numpy array

Aligned arrays of atoms forming hydrophobic contacts

oddt.interactions.pi_cation(mol1, mol2, cutoff=5, tolerance=30, cation_exact=False)[source]

Returns pairs of ring-cation atoms, which meet pi-cation criteria

Parameters
mol1, mol2oddt.toolkit.Molecule object

Molecules to compute ring-cation pairs

cutofffloat, (default=5)

Distance cutoff for Pi-cation pairs

toleranceint, (default=30)

Range (+/- tolerance) from perfect direction (perpendicular) in which pi-cation are considered as strict.

cation_exactbool

Requires interacting atoms to have non-zero formal charge.

Returns
r1ring_dict-type numpy array

Aligned rings forming pi-stacking

plus2atom_dict-type numpy array

Aligned cations forming pi-cation

strict_parallelnumpy array, dtype=bool

Boolean array align with ring-cation pairs, informing whether they form ‘strict’ pi-cation. If false, only distance cutoff is met, therefore the interaction is ‘crude’.

oddt.interactions.pi_metal(mol1, mol2, cutoff=5, tolerance=30)[source]

Returns pairs of ring-metal atoms, which meet pi-metal criteria

Parameters
mol1, mol2oddt.toolkit.Molecule object

Molecules to compute ring-metal pairs

cutofffloat, (default=5)

Distance cutoff for Pi-metal pairs

toleranceint, (default=30)

Range (+/- tolerance) from perfect direction (perpendicular) in which pi-metal are considered as strict.

Returns
r1ring_dict-type numpy array

Aligned rings forming pi-metal

matom_dict-type numpy array

Aligned metals forming pi-metal

strict_parallelnumpy array, dtype=bool

Boolean array align with ring-metal pairs, informing whether they form ‘strict’ pi-metal. If false, only distance cutoff is met, therefore the interaction is ‘crude’.

oddt.interactions.pi_stacking(mol1, mol2, cutoff=5, tolerance=30)[source]

Returns pairs of rings, which meet pi stacking criteria

Parameters
mol1, mol2oddt.toolkit.Molecule object

Molecules to compute ring pairs

cutofffloat, (default=5)

Distance cutoff for Pi-stacking pairs

toleranceint, (default=30)

Range (+/- tolerance) from perfect direction (parallel or perpendicular) in which pi-stackings are considered as strict.

Returns
r1, r2ring_dict-type numpy array

Aligned arrays of rings forming pi-stacking

strict_parallelnumpy array, dtype=bool

Boolean array align with ring pairs, informing whether rings form ‘strict’ parallel pi-stacking. If false, only distance cutoff is met, therefore the stacking is ‘crude’.

strict_perpendicularnumpy array, dtype=bool

Boolean array align with ring pairs, informing whether rings form ‘strict’ perpendicular pi-stacking (T-shaped, T-face, etc.). If false, only distance cutoff is met, therefore the stacking is ‘crude’.

oddt.interactions.salt_bridge_plus_minus(mol1, mol2, cutoff=4, cation_exact=False, anion_exact=False)[source]

Returns pairs of plus-mins atoms, which meet salt bridge criteria

Parameters
mol1, mol2oddt.toolkit.Molecule object

Molecules to compute plus and minus pairs

cutofffloat, (default=4)

Distance cutoff for A-H pairs

cation_exact, anion_exactbool

Requires interacting atoms to have non-zero formal charge.

Returns
plus, minusatom_dict-type numpy array

Aligned arrays of atoms forming salt bridge, firstly plus, secondly minus

oddt.interactions.salt_bridges(mol1, mol2, cutoff=4, mol1_exact=False, mol2_exact=False)[source]

Calculates salt bridges between molecules

Parameters
mol1, mol2oddt.toolkit.Molecule object

Molecules to compute plus and minus pairs

cutofffloat, (default=4)

Distance cutoff for plus-minus pairs

cation_exact, anion_exactbool

Requires interacting atoms to have non-zero formal charge.

Returns
mol1_atoms, mol2_atomsatom_dict-type numpy array

Aligned arrays of atoms forming salt bridges

oddt.metrics module

Metrics for estimating performance of drug discovery methods implemented in ODDT

oddt.metrics.auc(x, y)[source]

Compute Area Under the Curve (AUC) using the trapezoidal rule.

This is a general function, given points on a curve. For computing the area under the ROC-curve, see roc_auc_score(). For an alternative way to summarize a precision-recall curve, see average_precision_score().

Parameters
xndarray of shape (n,)

x coordinates. These must be either monotonic increasing or monotonic decreasing.

yndarray of shape, (n,)

y coordinates.

Returns
aucfloat

See also

roc_auc_score

Compute the area under the ROC curve.

average_precision_score

Compute average precision from prediction scores.

precision_recall_curve

Compute precision-recall pairs for different probability thresholds.

Examples

>>> import numpy as np
>>> from sklearn import metrics
>>> y = np.array([1, 1, 2, 2])
>>> pred = np.array([0.1, 0.4, 0.35, 0.8])
>>> fpr, tpr, thresholds = metrics.roc_curve(y, pred, pos_label=2)
>>> metrics.auc(fpr, tpr)
0.75
oddt.metrics.bedroc(y_true, y_score, alpha=20.0, pos_label=None)[source]

Computes Boltzmann-Enhanced Discrimination of Receiver Operating Characteristic [1]. This function assumes that results are already sorted and samples with best predictions are first.

Parameters
y_truearray, shape=[n_samples]

True binary labels, in range {0,1} or {-1,1}. If positive label is different than 1, it must be explicitly defined.

y_scorearray, shape=[n_samples]

Scores for tested series of samples

alpha: float

Alpha. 1/Alpha should be proportional to the percentage in EF.

pos_label: int

Positive label of samples (if other than 1)

Returns
bedroc_scorefloat

Boltzmann-Enhanced Discrimination of Receiver Operating Characteristic

References

1

Truchon J-F, Bayly CI. Evaluating virtual screening methods: good and bad metrics for the “early recognition” problem. J Chem Inf Model. 2007;47: 488-508. DOI: 10.1021/ci600426e

oddt.metrics.enrichment_factor(y_true, y_score, percentage=1, pos_label=None, kind='fold')[source]

Computes enrichment factor for given percentage, i.e. EF_1% is enrichment factor for first percent of given samples. This function assumes that results are already sorted and samples with best predictions are first.

Parameters
y_truearray, shape=[n_samples]

True binary labels, in range {0,1} or {-1,1}. If positive label is different than 1, it must be explicitly defined.

y_scorearray, shape=[n_samples]

Scores for tested series of samples

percentageint or float

The percentage for which EF is being calculated

pos_label: int

Positive label of samples (if other than 1)

kind: ‘fold’ or ‘percentage’ (default=’fold’)

Two kinds of enrichment factor: fold and percentage. Fold shows the increase over random distribution (1 is random, the higher EF the better enrichment). Percentage returns the fraction of positive labels within the top x% of dataset.

Returns
effloat

Enrichment Factor for given percenage in range 0:1

oddt.metrics.random_roc_log_auc(log_min=0.001, log_max=1.0)[source]

Computes area under semi-log ROC for random distribution.

Parameters
log_minfloat (default=0.001)

Minimum logarithm value for estimating AUC

log_maxfloat (default=1.)

Maximum logarithm value for estimating AUC.

Returns
aucfloat

semi-log ROC AUC for random distribution

oddt.metrics.rie(y_true, y_score, alpha=20, pos_label=None)[source]

Computes Robust Initial Enhancement [1]. This function assumes that results are already sorted and samples with best predictions are first.

Parameters
y_truearray, shape=[n_samples]

True binary labels, in range {0,1} or {-1,1}. If positive label is different than 1, it must be explicitly defined.

y_scorearray, shape=[n_samples]

Scores for tested series of samples

alpha: float

Alpha. 1/Alpha should be proportional to the percentage in EF.

pos_label: int

Positive label of samples (if other than 1)

Returns
rie_scorefloat

Robust Initial Enhancement

References

1

Sheridan, R. P.; Singh, S. B.; Fluder, E. M.; Kearsley, S. K. Protocols for bridging the peptide to nonpeptide gap in topological similarity searches. J. Chem. Inf. Comput. Sci. 2001, 41, 1395-1406. DOI: 10.1021/ci0100144

oddt.metrics.rmse(y_true, y_pred)[source]

Compute Root Mean Squared Error (RMSE)

Parameters
y_truearray-like of shape = [n_samples] or [n_samples, n_outputs]

Ground truth (correct) target values.

y_predarray-like of shape = [n_samples] or [n_samples, n_outputs]

Estimated target values.

Returns
rmsefloat

A positive floating point value (the best value is 0.0).

oddt.metrics.roc(y_true, y_score, *, pos_label=None, sample_weight=None, drop_intermediate=True)

Compute Receiver operating characteristic (ROC).

Note: this implementation is restricted to the binary classification task.

Read more in the User Guide.

Parameters
y_truendarray of shape (n_samples,)

True binary labels. If labels are not either {-1, 1} or {0, 1}, then pos_label should be explicitly given.

y_scorendarray of shape (n_samples,)

Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).

pos_labelint or str, default=None

The label of the positive class. When pos_label=None, if y_true is in {-1, 1} or {0, 1}, pos_label is set to 1, otherwise an error will be raised.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

drop_intermediatebool, default=True

Whether to drop some suboptimal thresholds which would not appear on a plotted ROC curve. This is useful in order to create lighter ROC curves.

New in version 0.17: parameter drop_intermediate.

Returns
fprndarray of shape (>2,)

Increasing false positive rates such that element i is the false positive rate of predictions with score >= thresholds[i].

tprndarray of shape (>2,)

Increasing true positive rates such that element i is the true positive rate of predictions with score >= thresholds[i].

thresholdsndarray of shape = (n_thresholds,)

Decreasing thresholds on the decision function used to compute fpr and tpr. thresholds[0] represents no instances being predicted and is arbitrarily set to max(y_score) + 1.

See also

plot_roc_curve

Plot Receiver operating characteristic (ROC) curve.

RocCurveDisplay

ROC Curve visualization.

det_curve

Compute error rates for different probability thresholds.

roc_auc_score

Compute the area under the ROC curve.

Notes

Since the thresholds are sorted from low to high values, they are reversed upon returning them to ensure they correspond to both fpr and tpr, which are sorted in reversed order during their calculation.

References

1

Wikipedia entry for the Receiver operating characteristic

2

Fawcett T. An introduction to ROC analysis[J]. Pattern Recognition Letters, 2006, 27(8):861-874.

Examples

>>> import numpy as np
>>> from sklearn import metrics
>>> y = np.array([1, 1, 2, 2])
>>> scores = np.array([0.1, 0.4, 0.35, 0.8])
>>> fpr, tpr, thresholds = metrics.roc_curve(y, scores, pos_label=2)
>>> fpr
array([0. , 0. , 0.5, 0.5, 1. ])
>>> tpr
array([0. , 0.5, 0.5, 1. , 1. ])
>>> thresholds
array([1.8 , 0.8 , 0.4 , 0.35, 0.1 ])
oddt.metrics.roc_auc(y_true, y_score, pos_label=None, ascending_score=True)[source]

Computes ROC AUC score

Parameters
y_truearray, shape=[n_samples]

True binary labels, in range {0,1} or {-1,1}. If positive label is different than 1, it must be explicitly defined.

y_scorearray, shape=[n_samples]

Scores for tested series of samples

pos_label: int

Positive label of samples (if other than 1)

ascending_score: bool (default=True)

Indicates if your score is ascendig. Ascending score icreases with deacreasing activity. In other words it ascends on ranking list (where actives are on top).

Returns
roc_aucfloat

ROC AUC in range 0:1

oddt.metrics.roc_log_auc(y_true, y_score, pos_label=None, ascending_score=True, log_min=0.001, log_max=1.0)[source]

Computes area under semi-log ROC.

Parameters
y_truearray, shape=[n_samples]

True binary labels, in range {0,1} or {-1,1}. If positive label is different than 1, it must be explicitly defined.

y_scorearray, shape=[n_samples]

Scores for tested series of samples

pos_label: int

Positive label of samples (if other than 1)

ascending_score: bool (default=True)

Indicates if your score is ascendig. Ascending score icreases with deacreasing activity. In other words it ascends on ranking list (where actives are on top).

log_minfloat (default=0.001)

Minimum value for estimating AUC. Lower values will be clipped for numerical stability.

log_maxfloat (default=1.)

Maximum value for estimating AUC. Higher values will be ignored.

Returns
aucfloat

semi-log ROC AUC

oddt.pandas module

oddt.shape module

oddt.shape.common_usr(molecule, ctd=None, cst=None, fct=None, ftf=None, atoms_type=None)[source]

Function used in USR and USRCAT function

Parameters
moleculeoddt.toolkit.Molecule

Molecule to compute USR shape descriptor

ctdnumpy array or None (default = None)

Coordinates of the molecular centroid If ‘None’, the point is calculated

cstnumpy array or None (default = None)

Coordinates of the closest atom to the molecular centroid If ‘None’, the point is calculated

fctnumpy array or None (default = None)

Coordinates of the farthest atom to the molecular centroid If ‘None’, the point is calculated

ftfnumpy array or None (default = None)

Coordinates of the farthest atom to the farthest atom to the molecular centroid If ‘None’, the point is calculated

atoms_typestr or None (default None)

Type of atoms to be selected from atom_dict If ‘None’, all atoms are used to calculate shape descriptor

Returns
shape_descriptornumpy array, shape = (12)

Array describing shape of molecule

oddt.shape.electroshape(mol)[source]

Computes shape descriptor based on Armstrong, M. S. et al. ElectroShape: fast molecular similarity calculations incorporating shape, chirality and electrostatics. J Comput Aided Mol Des 24, 789-801 (2010). http://dx.doi.org/doi:10.1007/s10822-010-9374-0

Aside from spatial coordinates, atoms’ charges are also used as the fourth dimension to describe shape of the molecule.

Parameters
moloddt.toolkit.Molecule

Molecule to compute Electroshape descriptor

Returns
shape_descriptornumpy array, shape = (15)

Array describing shape of molecule

oddt.shape.usr(molecule)[source]

Computes USR shape descriptor based on Ballester PJ, Richards WG (2007). Ultrafast shape recognition to search compound databases for similar molecular shapes. Journal of computational chemistry, 28(10):1711-23. http://dx.doi.org/10.1002/jcc.20681

Parameters
moleculeoddt.toolkit.Molecule

Molecule to compute USR shape descriptor

Returns
shape_descriptornumpy array, shape = (12)

Array describing shape of molecule

oddt.shape.usr_cat(molecule)[source]

Computes USRCAT shape descriptor based on Adrian M Schreyer, Tom Blundell (2012). USRCAT: real-time ultrafast shape recognition with pharmacophoric constraints. Journal of Cheminformatics, 2012 4:27. http://dx.doi.org/10.1186/1758-2946-4-27

Parameters
moleculeoddt.toolkit.Molecule

Molecule to compute USRCAT shape descriptor

Returns
shape_descriptornumpy array, shape = (60)

Array describing shape of molecule

oddt.shape.usr_similarity(mol1_shape, mol2_shape, ow=1.0, hw=1.0, rw=1.0, aw=1.0, dw=1.0)[source]

Computes similarity between molecules

Parameters
mol1_shapenumpy array

USR shape descriptor

mol2_shapenumpy array

USR shape descriptor

owfloat (default = 1.)

Scaling factor for all atoms Only used for USRCAT, ignored for other types

hwfloat (default = 1.)

Scaling factor for hydrophobic atoms Only used for USRCAT, ignored for other types

rwfloat (default = 1.)

Scaling factor for aromatic atoms Only used for USRCAT, ignored for other types

awfloat (default = 1.)

Scaling factor for acceptors Only used for USRCAT, ignored for other types

dwfloat (default = 1.)

Scaling factor for donors Only used for USRCAT, ignored for other types

Returns
similarityfloat from 0 to 1

Similarity between shapes of molecules, 1 indicates identical molecules

oddt.spatial module

Spatial functions included in ODDT Mainly used by other modules, but can be accessed directly.

oddt.spatial.angle(p1, p2, p3)[source]

Returns an angle from a series of 3 points (point #2 is centroid). Angle is returned in degrees.

Parameters
p1,p2,p3numpy arrays, shape = [n_points, n_dimensions]

Triplets of points in n-dimensional space, aligned in rows.

Returns
anglesnumpy array, shape = [n_points]

Series of angles in degrees

oddt.spatial.angle_2v(v1, v2)[source]

Returns an angle between two vecors.Angle is returned in degrees.

Parameters
v1,v2numpy arrays, shape = [n_vectors, n_dimensions]

Pairs of vectors in n-dimensional space, aligned in rows.

Returns
anglesnumpy array, shape = [n_vectors]

Series of angles in degrees

oddt.spatial.dihedral(p1, p2, p3, p4)[source]

Returns an dihedral angle from a series of 4 points. Dihedral is returned in degrees. Function distingishes clockwise and antyclockwise dihedrals.

Parameters
p1, p2, p3, p4numpy arrays, shape = [n_points, n_dimensions]

Quadruplets of points in n-dimensional space, aligned in rows.

Returns
anglesnumpy array, shape = [n_points]

Series of angles in degrees

oddt.spatial.distance(x, y)[source]

Computes distance between each pair of points from x and y.

Parameters
xnumpy arrays, shape = [n_x, 3]

Array of poinds in 3D

ynumpy arrays, shape = [n_y, 3]

Array of poinds in 3D

Returns
dist_matrixnumpy arrays, shape = [n_x, n_y]

Distance matrix

oddt.spatial.rmsd(ref, mol, ignore_h=True, method=None, normalize=False)[source]

Computes root mean square deviation (RMSD) between two molecules (including or excluding Hydrogens). No symmetry checks are performed.

Parameters
refoddt.toolkit.Molecule object

Reference molecule for the RMSD calculation

moloddt.toolkit.Molecule object

Query molecule for RMSD calculation

ignore_hbool (default=False)

Flag indicating to ignore Hydrogen atoms while performing RMSD calculation. This toggle works only with ‘hungarian’ method and without sorting (method=None).

methodstr (default=None)

The method to be used for atom asignment between ref and mol. None means that direct matching is applied, which is the default behavior. Available methods:

  • canonize - match heavy atoms using canonical ordering (it forces

ignoring H’s) - hungarian - minimize RMSD using Hungarian algorithm - min_symmetry - makes multiple molecule-molecule matches and finds minimal RMSD (the slowest). Hydrogens are ignored.

normalizebool (default=False)

Normalize RMSD by square root of rot. bonds

Returns
rmsdfloat

RMSD between two molecules

oddt.spatial.rotate(coords, alpha, beta, gamma)[source]

Rotate coords by cerain angle in X, Y, Z. Angles are specified in radians.

Parameters
coordsnumpy arrays, shape = [n_points, 3]

Coordinates in 3-dimensional space.

alpha, beta, gamma: float

Angles to rotate the coordinates along X, Y and Z axis. Angles are specified in radians.

Returns
new_coordsnumpy arrays, shape = [n_points, 3]

Rorated coordinates in 3-dimensional space.

oddt.surface module

This module generates and does computation with molecular surfaces.

oddt.surface.find_surface_residues(molecule, max_dist=None, scaling=1.0)[source]

Finds residues close to the molecular surface using generate_surface_marching_cubes. Ignores hydrogens and waters present in the molecule.

Parameters
moleculeoddt.toolkit.Molecule

Molecule to find surface residues in.

max_distarray_like, numeric or None (default = None)

Maximum distance from the surface where residues would still be considered close. If None, compares distances to radii of respective atoms.

scalingfloat (default = 1.0)

Expands the grid in which computation is done by generate_surface_marching_cubes by a factor of scaling. Results in a more accurate representation of the surface, and therefore more accurate computation of distances but increases computation time.

Returns
atom_dictnumpy array

An atom_dict containing only the surface residues from the original molecule.

oddt.surface.generate_surface_marching_cubes(molecule, remove_hoh=False, scaling=1.0, probe_radius=1.4)[source]

Generates a molecular surface mesh using the marching_cubes method from scikit-image. Ignores hydrogens present in the molecule.

Parameters
moleculeoddt.toolkit.Molecule object

Molecule for which the surface will be generated

remove_hohbool (default = False)

If True, remove waters from the molecule before generating the surface. Requires molecule.protein to be set to True.

scalingfloat (default = 1.0)

Expands the grid in which computation is done by a factor of scaling. Results in a more accurate representation of the surface, but increases computation time.

probe_radiusfloat (default = 1.4)

Radius of a ball used to patch up holes inside the molecule resulting from some molecular distances being larger (usually in protein). Basically reduces the surface to one accesible by other molecules of radius smaller than probe_radius.

Returns
vertsnumpy array

Spatial coordinates for mesh vertices.

facesnumpy array

Faces are defined by referencing vertices from verts.

oddt.utils module

Common utilities for ODDT

oddt.utils.check_molecule(mol, force_protein=False, force_coords=False, non_zero_atoms=False)[source]

Universal validator of molecule objects. Usage of positional arguments is allowed only for molecule object, otherwise it is prohibitted (i.e. the order of arguments will change). Desired properties of molecule are validated based on specified arguments. By default only the object type is checked. In case of discrepancy to the specification a ValueError is raised with appropriate message.

New in version 0.6.

Parameters
mol: oddt.toolkit.Molecule object

Object to verify

force_protein: bool (default=False)

Force the molecule to be marked as protein (mol.protein).

force_coords: bool (default=False)

Force the molecule to have non-zero coordinates.

non_zero_atoms: bool (default=False)

Check if molecule has at least one atom.

oddt.utils.chunker(iterable, chunksize=100)[source]

Generate chunks from a generator object. If iterable is passed which is not a generator it will be converted to one with iter().

New in version 0.6.

oddt.utils.compose_iter(iterable, funcs)[source]

Chain functions and apply them to iterable, by exhausting the iterable. Functions are executed in the order from funcs.

New in version 0.6.

oddt.utils.is_molecule(obj)[source]

Check whether an object is an oddt.toolkits.{rdk,ob}.Molecule instance.

New in version 0.6.

oddt.utils.is_openbabel_molecule(obj)[source]

Check whether an object is an oddt.toolkits.ob.Molecule instance.

New in version 0.6.

oddt.utils.is_rdkit_molecule(obj)[source]

Check whether an object is an oddt.toolkits.rdk.Molecule instance.

New in version 0.6.

oddt.utils.method_caller(obj, methodname, *args, **kwargs)[source]

Helper function to workaround Python 2 pickle limitations to parallelize methods and generator objects

oddt.virtualscreening module

ODDT pipeline framework for virtual screening

class oddt.virtualscreening.virtualscreening(n_cpu=- 1, verbose=False, chunksize=100)[source]

Bases: object

Virtual Screening pipeline stack

Parameters
n_cpu: int (default=-1)

The number of parallel procesors to use

verbose: bool (default=False)

Verbosity flag for some methods

Methods

apply_filter(expression[, soft_fail])

Filtering method, can use raw expressions (strings to be evaled in if statement, can use oddt.toolkit.Molecule methods, eg. mol.molwt < 500) Currently supported presets: * Lipinski Rule of 5 (‘ro5’ or ‘l5’) * Fragment Rule of 3 (‘ro3’) * PAINS filter (‘pains’).

dock(engine, protein, *args, **kwargs)

Docking procedure.

fetch()

A method to exhaust the pipeline.

load_ligands(fmt, ligands_file, **kwargs)

Loads file with ligands.

score(function[, protein])

Scoring procedure compatible with any scoring function implemented in ODDT and other pickled SFs which are subclasses of oddt.scoring.scorer.

similarity(method, query[, cutoff, protein])

Similarity filter. Supported structural methods:

write(fmt, filename[, csv_filename])

Outputs molecules to a file

write_csv(csv_filename[, fields, keep_pipe])

Outputs molecules to a csv file

apply_filter(expression, soft_fail=0)[source]

Filtering method, can use raw expressions (strings to be evaled in if statement, can use oddt.toolkit.Molecule methods, eg. mol.molwt < 500) Currently supported presets:

  • Lipinski Rule of 5 (‘ro5’ or ‘l5’)

  • Fragment Rule of 3 (‘ro3’)

  • PAINS filter (‘pains’)

Parameters
expression: string or list of strings

Expresion(s) to be used while filtering.

soft_fail: int (default=0)

The number of faulures molecule can have to pass filter, aka. soft-fails.

dock(engine, protein, *args, **kwargs)[source]

Docking procedure.

Parameters
engine: string

Which docking engine to use.

Notes

Additional parameters are passed directly to the engine. Following docking engines are supported:

1. Audodock Vina (`engine="autodock_vina"`), see oddt.docking.autodock_vina.

fetch()[source]

A method to exhaust the pipeline. Itself it is lazy (a generator)

load_ligands(fmt, ligands_file, **kwargs)[source]

Loads file with ligands.

Parameters
file_type: string

Type of molecular file

ligands_file: string

Path to a file, which is loaded to pipeline

score(function, protein=None, *args, **kwargs)[source]

Scoring procedure compatible with any scoring function implemented in ODDT and other pickled SFs which are subclasses of oddt.scoring.scorer.

Parameters
function: string

Which scoring function to use.

protein: oddt.toolkit.Molecule

Default protein to use as reference

Notes

Additional parameters are passed directly to the scoring function.

similarity(method, query, cutoff=0.9, protein=None)[source]
Similarity filter. Supported structural methods:
  • ift: interaction fingerprints

  • sift: simple interaction fingerprints

  • usr: Ultrafast Shape recognition

  • usr_cat: Ultrafast Shape recognition, Credo Atom Types

  • electroshape: Electroshape, an USR method including partial charges

Parameters
method: string

Similarity method used to compare molecules. Avaiale methods: * ifp - interaction fingerprint (requires a receptor) * sifp - simple interaction fingerprint (requires a receptor) * usr - Ultrafast Shape Reckognition * usr_cat - USR, with CREDO atom types * electroshape - Electroshape, USR with moments representing partial charge

query: oddt.toolkit.Molecule or list of oddt.toolkit.Molecule

Query molecules to compare the pipeline to.

cutoff: float

Similarity cutoff for filtering molecules. Any similarity lower than it will be filtered out.

protein: oddt.toolkit.Molecule (default = None)

Protein for underling method. By default it’s empty, but sturctural fingerprints need one.

write(fmt, filename, csv_filename=None, **kwargs)[source]

Outputs molecules to a file

Parameters
file_type: string

Type of molecular file

ligands_file: string

Path to a output file

csv_filename: string

Optional path to a CSV file

write_csv(csv_filename, fields=None, keep_pipe=False, **kwargs)[source]

Outputs molecules to a csv file

Parameters
csv_filename: string

Optional path to a CSV file

fields: list (default None)

List of fields to save in CSV file

keep_pipe: bool (default=False)

If set to True, the ligand pipe is sustained.

Module contents

Open Drug Discovery Toolkit

Universal and easy to use resource for various drug discovery tasks, ie docking, virutal screening, rescoring.

Attributes

toolkitmodule,

Toolkits backend module, currenlty OpenBabel [ob] and RDKit [rdk]. This setting is toolkit-wide, and sets given toolkit as default