oddt package¶
Subpackages¶
Submodules¶
oddt.datasets module¶
Datasets wrapped in convenient models
-
class
oddt.datasets.
CASF
(home)[source]¶ Bases:
object
Load CASF dataset as described in Li, Y. et al. Comparative Assessment of Scoring Functions on an Updated Benchmark: 2. Evaluation Methods and General Results. J. Chem. Inf. Model. 54, 1717-1736. (2014) http://dx.doi.org/10.1021/ci500081m
- Parameters
- home: string
Path to CASF dataset main directory
Methods
precomputed_score
([scoring_function])Load precomputed results of scoring power test for various scoring functions.
precomputed_screening
([scoring_function, …])Load precomputed results of screening power test for various scoring functions
-
precomputed_score
(scoring_function=None)[source]¶ Load precomputed results of scoring power test for various scoring functions.
- Parameters
- scoring_function: string (default=None)
Name of the scoring function to get results If None, all results are returned.
-
precomputed_screening
(scoring_function=None, cluster_id=None)[source]¶ Load precomputed results of screening power test for various scoring functions
- Parameters
- scoring_function: string (default=None)
Name of the scoring function to get results If None, all results are returned
- cluster_id: int (default=None)
Number of the protein cluster to get results If None, all results are returned
-
class
oddt.datasets.
dude
(home)[source]¶ Bases:
object
A wrapper for DUD-E (A Database of Useful Decoys: Enhanced) http://dude.docking.org/
- Parameters
- homestr
Path to files from dud-e
oddt.fingerprints module¶
Module checks interactions between two molecules and creates interacion fingerprints.
-
oddt.fingerprints.
ECFP
(mol, depth=2, size=4096, count_bits=True, sparse=True, use_pharm_features=False)[source]¶ Extended connectivity fingerprints (ECFP) with an option to include atom features (FCPF). Depth of a fingerprint is counted as bond-steps, thus the depth for ECFP2 = 1, ECPF4 = 2, ECFP6 = 3, etc.
Reference: Rogers D, Hahn M. Extended-connectivity fingerprints. J Chem Inf Model. 2010;50: 742-754. http://dx.doi.org/10.1021/ci100050t
- Parameters
- moloddt.toolkit.Molecule object
Input molecule for the FP calculations
- depthint (deafult = 2)
The depth of the fingerprint, i.e. the number of bonds in Morgan algorithm. Note: For ECFP2: depth = 1, ECFP4: depth = 2, etc.
- sizeint (default = 4096)
Final size of fingerprint to which it is folded.
- count_bitsbool (default = True)
Should the bits be counted or unique. In dense representation it translates to integer array (count_bits=True) or boolean array if False.
- sparsebool (default=True)
Should fingerprints be dense (contain all bits) or sparse (just the on bits).
- use_pharm_featuresbool (default=False)
Switch to use pharmacophoric features as atom representation instead of explicit atomic numbers etc.
- Returns
- fingerprintnumpy array
Calsulated FP of fixed size (dense) or on bits indices (sparse). Dtype is either integer or boolean.
-
oddt.fingerprints.
InteractionFingerprint
(ligand, protein, strict=True)[source]¶ Interaction fingerprint accomplished by converting the molecular interaction of ligand-protein into bit array according to the residue of choice and the interaction. For every residue (One row = one residue) there are eight bits which represent eight type of interactions:
(Column 0) hydrophobic contacts
(Column 1) aromatic face to face
(Column 2) aromatic edge to face
(Column 3) hydrogen bond (protein as hydrogen bond donor)
(Column 4) hydrogen bond (protein as hydrogen bond acceptor)
(Column 5) salt bridges (protein positively charged)
(Column 6) salt bridges (protein negatively charged)
(Column 7) salt bridges (ionic bond with metal ion)
- Parameters
- ligand, proteinoddt.toolkit.Molecule object
Molecules, which are analysed in order to find interactions.
- strictbool (deafult = True)
If False, do not include condition, which informs whether atoms form ‘strict’ H-bond (pass all angular cutoffs).
- Returns
- InteractionFingerprintnumpy array
Vector of calculated IFP (size = no residues * 8 type of interaction)
-
oddt.fingerprints.
PLEC
(ligand, protein, depth_ligand=2, depth_protein=4, distance_cutoff=4.5, size=16384, count_bits=True, sparse=True, ignore_hoh=True, bits_info=None)[source]¶ Protein ligand extended connectivity fingerprint. For every pair of atoms in contact, compute ECFP and then hash every single, corresponding depth.
- Parameters
- ligand, proteinoddt.toolkit.Molecule object
Molecules, which are analysed in order to find interactions.
- depth_ligand, depth_proteinint (deafult = (2, 4))
The depth of the fingerprint, i.e. the number of bonds in Morgan algorithm. Note: For ECFP2: depth = 1, ECFP4: depth = 2, etc.
- size: int (default = 16384)
SPLIF is folded to given size.
- distance_cutoff: float (default=4.5)
Cutoff distance for close contacts.
- sparsebool (default = True)
Should fingerprints be dense (contain all bits) or sparse (just the on bits).
- count_bitsbool (default = True)
Should the bits be counted or unique. In dense representation it translates to integer array (count_bits=True) or boolean array if False.
- ignore_hohbool (default = True)
Should the water molecules be ignored. This is based on the name of the residue (‘HOH’).
- bits_infodict or None (default = None)
If dictionary is provided it is filled with information about bit contents. Root atom index and depth is provided for both ligand and protein. Dictionary is modified in-place.
- Returns
- PLECnumpy array
fp (size = atoms in contacts * max(depth_protein, depth_ligand))
-
oddt.fingerprints.
SPLIF
(ligand, protein, depth=1, size=4096, distance_cutoff=4.5)[source]¶ Calculates structural protein-ligand interaction fingerprint (SPLIF), based on http://pubs.acs.org/doi/abs/10.1021/ci500319f.
- Parameters
- ligand, proteinoddt.toolkit.Molecule object
Molecules, which are analysed in order to find interactions.
- depthint (deafult = 1)
The depth of the fingerprint, i.e. the number of bonds in Morgan algorithm. Note: For ECFP2: depth = 1, ECFP4: depth = 2, etc.
- size: int (default = 4096)
SPLIF is folded to given size.
- distance_cutoff: float (default=4.5)
Cutoff distance for close contacts.
- Returns
- SPLIFnumpy array
Calculated SPLIF.shape = (no. of atoms, ). Every row consists of three elements:
row[0] = index of hashed atoms row[1].shape = (7, 3) -> ligand’s atom coords and 6 his neigbor’s row[2].shape = (7, 3) -> protein’s atom coords and 6 his neigbor’s
-
oddt.fingerprints.
SimpleInteractionFingerprint
(ligand, protein, strict=True)[source]¶ Based on http://dx.doi.org/10.1016/j.csbj.2014.05.004. Every IFP consists of 8 bits per amino acid (One row = one amino acid) and present eight type of interaction:
(Column 0) hydrophobic contacts
(Column 1) aromatic face to face
(Column 2) aromatic edge to face
(Column 3) hydrogen bond (protein as hydrogen bond donor)
(Column 4) hydrogen bond (protein as hydrogen bond acceptor)
(Column 5) salt bridges (protein positively charged)
(Column 6) salt bridges (protein negatively charged)
(Column 7) salt bridges (ionic bond with metal ion)
Returns matrix, which is sorted according to this pattern : ‘ALA’, ‘ARG’, ‘ASN’, ‘ASP’, ‘CYS’, ‘GLN’, ‘GLU’, ‘GLY’, ‘HIS’, ‘ILE’, ‘LEU’, ‘LYS’, ‘MET’, ‘PHE’, ‘PRO’, ‘SER’, ‘THR’, ‘TRP’, ‘TYR’, ‘VAL’, ‘’. The ‘’ means cofactor. Index of amino acid in pattern coresponds to row in returned matrix.
- Parameters
- ligand, proteinoddt.toolkit.Molecule object
Molecules, which are analysed in order to find interactions.
- strictbool (deafult = True)
If False, do not include condition, which informs whether atoms form ‘strict’ H-bond (pass all angular cutoffs).
- Returns
- InteractionFingerprintnumpy array
Vector of calculated IFP (size = 168)
-
oddt.fingerprints.
dice
(a, b, sparse=False)[source]¶ Calculates the Dice coefficient, the ratio of the bits in common to the arithmetic mean of the number of ‘on’ bits in the two fingerprints. Supports integer and boolean fingerprints.
- Parameters
- a, bnumpy array
Interaction fingerprints, which are compared in order to determine similarity.
- sparsebool (default=False)
Type of FPs to use. Defaults to dense form.
- Returns
- scorefloat
Similarity between a, b.
-
oddt.fingerprints.
similarity_SPLIF
(reference, query, rmsd_cutoff=1.0)[source]¶ Calculates similarity between structural interaction fingerprints, based on doi:http://pubs.acs.org/doi/abs/10.1021/ci500319f.
- Parameters
- reference, query: numpy.array
SPLIFs, which are compared in order to determine similarity.
- rmsd_cutoffint (default = 1)
Specific treshold for which, bits are considered as fully matching.
- Returns
- SimilarityScorefloat
Similarity between given fingerprints.
-
oddt.fingerprints.
tanimoto
(a, b, sparse=False)[source]¶ Tanimoto coefficient, supports boolean fingerprints. Integer fingerprints are casted to boolean.
- Parameters
- a, bnumpy array
Interaction fingerprints, which are compared in order to determine similarity.
- sparsebool (default=False)
Type of FPs to use. Defaults to dense form.
- Returns
- scorefloat
Similarity between a, b.
oddt.interactions module¶
Module calculates interactions between two molecules (proein-protein, protein-ligand, small-small). Currently following interacions are implemented:
hydrogen bonds
halogen bonds
pi stacking (parallel and perpendicular)
salt bridges
hydrophobic contacts
pi-cation
metal coordination
pi-metal
-
oddt.interactions.
acceptor_metal
(mol1, mol2, tolerance=30, cutoff=4)[source]¶ Returns pairs of acceptor-metal atoms, which meet metal coordination criteria Note: This function is directional (mol1 holds acceptors, mol2 holds metals)
- Parameters
- mol1, mol2oddt.toolkit.Molecule object
Molecules to compute acceptor and metal pairs
- cutofffloat, (default=4)
Distance cutoff for A-M pairs
- toleranceint, (default=30)
Range (+/- tolerance) from perfect direction defined by atoms hybridization in metal coordination are considered as strict.
- Returns
- a, datom_dict-type numpy array
Aligned arrays of atoms forming metal coordination, firstly acceptors, secondly metals.
- strictnumpy array, dtype=bool
Boolean array align with atom pairs, informing whether atoms form ‘strict’ metal coordination (pass all angular cutoffs). If false, only distance cutoff is met, therefore the interaction is ‘crude’.
-
oddt.interactions.
close_contacts
(x, y, cutoff, x_column='coords', y_column='coords', cutoff_low=0.0)[source]¶ Returns pairs of atoms which are within close contac distance cutoff. The cutoff is semi-inclusive, i.e (cutoff_low, cutoff].
- Parameters
- x, yatom_dict-type numpy array
Atom dictionaries generated by oddt.toolkit.Molecule objects.
- cutofffloat
Cutoff distance for close contacts
- x_column, ycolumnstring, (default=’coords’)
Column containing coordinates of atoms (or pseudo-atoms, i.e. ring centroids)
- cutoff_lowfloat (default=0.)
Lower bound of contacts to find (exclusive). Zero by default. .. versionadded:: 0.6
- Returns
- x_, y_atom_dict-type numpy array
Aligned pairs of atoms in close contact for further processing.
-
oddt.interactions.
halogenbond_acceptor_halogen
(mol1, mol2, tolerance=30, cutoff=4)[source]¶ Returns pairs of acceptor-halogen atoms, which meet halogen bond criteria
- Parameters
- mol1, mol2oddt.toolkit.Molecule object
Molecules to compute halogen bond acceptor and halogen pairs
- cutofffloat, (default=4)
Distance cutoff for A-H pairs
- toleranceint, (default=30)
Range (+/- tolerance) from perfect direction defined by atoms hybridization in which halogen bonds are considered as strict.
- Returns
- a, hatom_dict-type numpy array
Aligned arrays of atoms forming halogen bond, firstly acceptors, secondly halogens
- strictnumpy array, dtype=bool
Boolean array align with atom pairs, informing whether atoms form ‘strict’ halogen bond (pass all angular cutoffs). If false, only distance cutoff is met, therefore the bond is ‘crude’.
-
oddt.interactions.
halogenbonds
(mol1, mol2, cutoff=4, tolerance=30)[source]¶ Calculates halogen bonds between molecules
- Parameters
- mol1, mol2oddt.toolkit.Molecule object
Molecules to compute halogen bond acceptor and halogen pairs
- cutofffloat, (default=4)
Distance cutoff for A-H pairs
- toleranceint, (default=30)
Range (+/- tolerance) from perfect direction defined by atoms hybridization in which halogen bonds are considered as strict.
- Returns
- mol1_atoms, mol2_atomsatom_dict-type numpy array
Aligned arrays of atoms forming halogen bond
- strictnumpy array, dtype=bool
Boolean array align with atom pairs, informing whether atoms form ‘strict’ halogen bond (pass all angular cutoffs). If false, only distance cutoff is met, therefore the bond is ‘crude’.
-
oddt.interactions.
hbond_acceptor_donor
(mol1, mol2, cutoff=3.5, tolerance=30, donor_exact=False)[source]¶ Returns pairs of acceptor-donor atoms, which meet H-bond criteria
- Parameters
- mol1, mol2oddt.toolkit.Molecule object
Molecules to compute H-bond acceptor and H-bond donor pairs
- cutofffloat, (default=3.5)
Distance cutoff for A-D pairs
- toleranceint, (default=30)
Range (+/- tolerance) from perfect direction defined by acceptor/donor hybridization in which H-bonds are considered as strict.
- donor_exactbool
Use exact protonation states for donors, i.e. require Hs on donor. By default ODDT implies some tautomeric structures as protonated, even if there is no H on specific atom.
- Returns
- a, datom_dict-type numpy array
Aligned arrays of atoms forming H-bond, firstly acceptors, secondly donors.
- strictnumpy array, dtype=bool
Boolean array align with atom pairs, informing whether atoms form ‘strict’ H-bond (pass all angular cutoffs). If false, only distance cutoff is met, therefore the bond is ‘crude’.
-
oddt.interactions.
hbonds
(mol1, mol2, cutoff=3.5, tolerance=30, mol1_exact=False, mol2_exact=False)[source]¶ Calculates H-bonds between molecules
- Parameters
- mol1, mol2oddt.toolkit.Molecule object
Molecules to compute H-bond acceptor and H-bond donor pairs
- cutofffloat, (default=3.5)
Distance cutoff for A-D pairs
- toleranceint, (default=30)
Range (+/- tolerance) from perfect direction defined by atoms hybridization in which H-bonds are considered as strict.
- mol1_exact, mol2_exactbool
If set to true, exact protonations states are used for respective molecules, i.e. require Hs. By default ODDT implies some tautomeric structures as protonated, even if there is no H on specific atom.
- Returns
- mol1_atoms, mol2_atomsatom_dict-type numpy array
Aligned arrays of atoms forming H-bond
- strictnumpy array, dtype=bool
Boolean array align with atom pairs, informing whether atoms form ‘strict’ H-bond (pass all angular cutoffs). If false, only distance cutoff is met, therefore the bond is ‘crude’.
-
oddt.interactions.
hydrophobic_contacts
(mol1, mol2, cutoff=4)[source]¶ Calculates hydrophobic contacts between molecules
- Parameters
- mol1, mol2oddt.toolkit.Molecule object
Molecules to compute hydrophobe pairs
- cutofffloat, (default=4)
Distance cutoff for hydrophobe pairs
- Returns
- mol1_atoms, mol2_atomsatom_dict-type numpy array
Aligned arrays of atoms forming hydrophobic contacts
-
oddt.interactions.
pi_cation
(mol1, mol2, cutoff=5, tolerance=30, cation_exact=False)[source]¶ Returns pairs of ring-cation atoms, which meet pi-cation criteria
- Parameters
- mol1, mol2oddt.toolkit.Molecule object
Molecules to compute ring-cation pairs
- cutofffloat, (default=5)
Distance cutoff for Pi-cation pairs
- toleranceint, (default=30)
Range (+/- tolerance) from perfect direction (perpendicular) in which pi-cation are considered as strict.
- cation_exactbool
Requires interacting atoms to have non-zero formal charge.
- Returns
- r1ring_dict-type numpy array
Aligned rings forming pi-stacking
- plus2atom_dict-type numpy array
Aligned cations forming pi-cation
- strict_parallelnumpy array, dtype=bool
Boolean array align with ring-cation pairs, informing whether they form ‘strict’ pi-cation. If false, only distance cutoff is met, therefore the interaction is ‘crude’.
-
oddt.interactions.
pi_metal
(mol1, mol2, cutoff=5, tolerance=30)[source]¶ Returns pairs of ring-metal atoms, which meet pi-metal criteria
- Parameters
- mol1, mol2oddt.toolkit.Molecule object
Molecules to compute ring-metal pairs
- cutofffloat, (default=5)
Distance cutoff for Pi-metal pairs
- toleranceint, (default=30)
Range (+/- tolerance) from perfect direction (perpendicular) in which pi-metal are considered as strict.
- Returns
- r1ring_dict-type numpy array
Aligned rings forming pi-metal
- matom_dict-type numpy array
Aligned metals forming pi-metal
- strict_parallelnumpy array, dtype=bool
Boolean array align with ring-metal pairs, informing whether they form ‘strict’ pi-metal. If false, only distance cutoff is met, therefore the interaction is ‘crude’.
-
oddt.interactions.
pi_stacking
(mol1, mol2, cutoff=5, tolerance=30)[source]¶ Returns pairs of rings, which meet pi stacking criteria
- Parameters
- mol1, mol2oddt.toolkit.Molecule object
Molecules to compute ring pairs
- cutofffloat, (default=5)
Distance cutoff for Pi-stacking pairs
- toleranceint, (default=30)
Range (+/- tolerance) from perfect direction (parallel or perpendicular) in which pi-stackings are considered as strict.
- Returns
- r1, r2ring_dict-type numpy array
Aligned arrays of rings forming pi-stacking
- strict_parallelnumpy array, dtype=bool
Boolean array align with ring pairs, informing whether rings form ‘strict’ parallel pi-stacking. If false, only distance cutoff is met, therefore the stacking is ‘crude’.
- strict_perpendicularnumpy array, dtype=bool
Boolean array align with ring pairs, informing whether rings form ‘strict’ perpendicular pi-stacking (T-shaped, T-face, etc.). If false, only distance cutoff is met, therefore the stacking is ‘crude’.
-
oddt.interactions.
salt_bridge_plus_minus
(mol1, mol2, cutoff=4, cation_exact=False, anion_exact=False)[source]¶ Returns pairs of plus-mins atoms, which meet salt bridge criteria
- Parameters
- mol1, mol2oddt.toolkit.Molecule object
Molecules to compute plus and minus pairs
- cutofffloat, (default=4)
Distance cutoff for A-H pairs
- cation_exact, anion_exactbool
Requires interacting atoms to have non-zero formal charge.
- Returns
- plus, minusatom_dict-type numpy array
Aligned arrays of atoms forming salt bridge, firstly plus, secondly minus
-
oddt.interactions.
salt_bridges
(mol1, mol2, cutoff=4, mol1_exact=False, mol2_exact=False)[source]¶ Calculates salt bridges between molecules
- Parameters
- mol1, mol2oddt.toolkit.Molecule object
Molecules to compute plus and minus pairs
- cutofffloat, (default=4)
Distance cutoff for plus-minus pairs
- cation_exact, anion_exactbool
Requires interacting atoms to have non-zero formal charge.
- Returns
- mol1_atoms, mol2_atomsatom_dict-type numpy array
Aligned arrays of atoms forming salt bridges
oddt.metrics module¶
Metrics for estimating performance of drug discovery methods implemented in ODDT
-
oddt.metrics.
auc
(x, y)[source]¶ Compute Area Under the Curve (AUC) using the trapezoidal rule.
This is a general function, given points on a curve. For computing the area under the ROC-curve, see
roc_auc_score()
. For an alternative way to summarize a precision-recall curve, seeaverage_precision_score()
.- Parameters
- xndarray of shape (n,)
x coordinates. These must be either monotonic increasing or monotonic decreasing.
- yndarray of shape, (n,)
y coordinates.
- Returns
- aucfloat
See also
roc_auc_score
Compute the area under the ROC curve.
average_precision_score
Compute average precision from prediction scores.
precision_recall_curve
Compute precision-recall pairs for different probability thresholds.
Examples
>>> import numpy as np >>> from sklearn import metrics >>> y = np.array([1, 1, 2, 2]) >>> pred = np.array([0.1, 0.4, 0.35, 0.8]) >>> fpr, tpr, thresholds = metrics.roc_curve(y, pred, pos_label=2) >>> metrics.auc(fpr, tpr) 0.75
-
oddt.metrics.
bedroc
(y_true, y_score, alpha=20.0, pos_label=None)[source]¶ Computes Boltzmann-Enhanced Discrimination of Receiver Operating Characteristic [1]. This function assumes that results are already sorted and samples with best predictions are first.
- Parameters
- y_truearray, shape=[n_samples]
True binary labels, in range {0,1} or {-1,1}. If positive label is different than 1, it must be explicitly defined.
- y_scorearray, shape=[n_samples]
Scores for tested series of samples
- alpha: float
Alpha. 1/Alpha should be proportional to the percentage in EF.
- pos_label: int
Positive label of samples (if other than 1)
- Returns
- bedroc_scorefloat
Boltzmann-Enhanced Discrimination of Receiver Operating Characteristic
References
- 1
Truchon J-F, Bayly CI. Evaluating virtual screening methods: good and bad metrics for the “early recognition” problem. J Chem Inf Model. 2007;47: 488-508. DOI: 10.1021/ci600426e
-
oddt.metrics.
enrichment_factor
(y_true, y_score, percentage=1, pos_label=None, kind='fold')[source]¶ Computes enrichment factor for given percentage, i.e. EF_1% is enrichment factor for first percent of given samples. This function assumes that results are already sorted and samples with best predictions are first.
- Parameters
- y_truearray, shape=[n_samples]
True binary labels, in range {0,1} or {-1,1}. If positive label is different than 1, it must be explicitly defined.
- y_scorearray, shape=[n_samples]
Scores for tested series of samples
- percentageint or float
The percentage for which EF is being calculated
- pos_label: int
Positive label of samples (if other than 1)
- kind: ‘fold’ or ‘percentage’ (default=’fold’)
Two kinds of enrichment factor: fold and percentage. Fold shows the increase over random distribution (1 is random, the higher EF the better enrichment). Percentage returns the fraction of positive labels within the top x% of dataset.
- Returns
- effloat
Enrichment Factor for given percenage in range 0:1
-
oddt.metrics.
random_roc_log_auc
(log_min=0.001, log_max=1.0)[source]¶ Computes area under semi-log ROC for random distribution.
- Parameters
- log_minfloat (default=0.001)
Minimum logarithm value for estimating AUC
- log_maxfloat (default=1.)
Maximum logarithm value for estimating AUC.
- Returns
- aucfloat
semi-log ROC AUC for random distribution
-
oddt.metrics.
rie
(y_true, y_score, alpha=20, pos_label=None)[source]¶ Computes Robust Initial Enhancement [1]. This function assumes that results are already sorted and samples with best predictions are first.
- Parameters
- y_truearray, shape=[n_samples]
True binary labels, in range {0,1} or {-1,1}. If positive label is different than 1, it must be explicitly defined.
- y_scorearray, shape=[n_samples]
Scores for tested series of samples
- alpha: float
Alpha. 1/Alpha should be proportional to the percentage in EF.
- pos_label: int
Positive label of samples (if other than 1)
- Returns
- rie_scorefloat
Robust Initial Enhancement
References
- 1
Sheridan, R. P.; Singh, S. B.; Fluder, E. M.; Kearsley, S. K. Protocols for bridging the peptide to nonpeptide gap in topological similarity searches. J. Chem. Inf. Comput. Sci. 2001, 41, 1395-1406. DOI: 10.1021/ci0100144
-
oddt.metrics.
rmse
(y_true, y_pred)[source]¶ Compute Root Mean Squared Error (RMSE)
- Parameters
- y_truearray-like of shape = [n_samples] or [n_samples, n_outputs]
Ground truth (correct) target values.
- y_predarray-like of shape = [n_samples] or [n_samples, n_outputs]
Estimated target values.
- Returns
- rmsefloat
A positive floating point value (the best value is 0.0).
-
oddt.metrics.
roc
(y_true, y_score, *, pos_label=None, sample_weight=None, drop_intermediate=True)¶ Compute Receiver operating characteristic (ROC).
Note: this implementation is restricted to the binary classification task.
Read more in the User Guide.
- Parameters
- y_truendarray of shape (n_samples,)
True binary labels. If labels are not either {-1, 1} or {0, 1}, then pos_label should be explicitly given.
- y_scorendarray of shape (n_samples,)
Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).
- pos_labelint or str, default=None
The label of the positive class. When
pos_label=None
, if y_true is in {-1, 1} or {0, 1},pos_label
is set to 1, otherwise an error will be raised.- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- drop_intermediatebool, default=True
Whether to drop some suboptimal thresholds which would not appear on a plotted ROC curve. This is useful in order to create lighter ROC curves.
New in version 0.17: parameter drop_intermediate.
- Returns
- fprndarray of shape (>2,)
Increasing false positive rates such that element i is the false positive rate of predictions with score >= thresholds[i].
- tprndarray of shape (>2,)
Increasing true positive rates such that element i is the true positive rate of predictions with score >= thresholds[i].
- thresholdsndarray of shape = (n_thresholds,)
Decreasing thresholds on the decision function used to compute fpr and tpr. thresholds[0] represents no instances being predicted and is arbitrarily set to max(y_score) + 1.
See also
plot_roc_curve
Plot Receiver operating characteristic (ROC) curve.
RocCurveDisplay
ROC Curve visualization.
det_curve
Compute error rates for different probability thresholds.
roc_auc_score
Compute the area under the ROC curve.
Notes
Since the thresholds are sorted from low to high values, they are reversed upon returning them to ensure they correspond to both
fpr
andtpr
, which are sorted in reversed order during their calculation.References
- 1
- 2
Fawcett T. An introduction to ROC analysis[J]. Pattern Recognition Letters, 2006, 27(8):861-874.
Examples
>>> import numpy as np >>> from sklearn import metrics >>> y = np.array([1, 1, 2, 2]) >>> scores = np.array([0.1, 0.4, 0.35, 0.8]) >>> fpr, tpr, thresholds = metrics.roc_curve(y, scores, pos_label=2) >>> fpr array([0. , 0. , 0.5, 0.5, 1. ]) >>> tpr array([0. , 0.5, 0.5, 1. , 1. ]) >>> thresholds array([1.8 , 0.8 , 0.4 , 0.35, 0.1 ])
-
oddt.metrics.
roc_auc
(y_true, y_score, pos_label=None, ascending_score=True)[source]¶ Computes ROC AUC score
- Parameters
- y_truearray, shape=[n_samples]
True binary labels, in range {0,1} or {-1,1}. If positive label is different than 1, it must be explicitly defined.
- y_scorearray, shape=[n_samples]
Scores for tested series of samples
- pos_label: int
Positive label of samples (if other than 1)
- ascending_score: bool (default=True)
Indicates if your score is ascendig. Ascending score icreases with deacreasing activity. In other words it ascends on ranking list (where actives are on top).
- Returns
- roc_aucfloat
ROC AUC in range 0:1
-
oddt.metrics.
roc_log_auc
(y_true, y_score, pos_label=None, ascending_score=True, log_min=0.001, log_max=1.0)[source]¶ Computes area under semi-log ROC.
- Parameters
- y_truearray, shape=[n_samples]
True binary labels, in range {0,1} or {-1,1}. If positive label is different than 1, it must be explicitly defined.
- y_scorearray, shape=[n_samples]
Scores for tested series of samples
- pos_label: int
Positive label of samples (if other than 1)
- ascending_score: bool (default=True)
Indicates if your score is ascendig. Ascending score icreases with deacreasing activity. In other words it ascends on ranking list (where actives are on top).
- log_minfloat (default=0.001)
Minimum value for estimating AUC. Lower values will be clipped for numerical stability.
- log_maxfloat (default=1.)
Maximum value for estimating AUC. Higher values will be ignored.
- Returns
- aucfloat
semi-log ROC AUC
oddt.pandas module¶
oddt.shape module¶
-
oddt.shape.
common_usr
(molecule, ctd=None, cst=None, fct=None, ftf=None, atoms_type=None)[source]¶ Function used in USR and USRCAT function
- Parameters
- moleculeoddt.toolkit.Molecule
Molecule to compute USR shape descriptor
- ctdnumpy array or None (default = None)
Coordinates of the molecular centroid If ‘None’, the point is calculated
- cstnumpy array or None (default = None)
Coordinates of the closest atom to the molecular centroid If ‘None’, the point is calculated
- fctnumpy array or None (default = None)
Coordinates of the farthest atom to the molecular centroid If ‘None’, the point is calculated
- ftfnumpy array or None (default = None)
Coordinates of the farthest atom to the farthest atom to the molecular centroid If ‘None’, the point is calculated
- atoms_typestr or None (default None)
Type of atoms to be selected from atom_dict If ‘None’, all atoms are used to calculate shape descriptor
- Returns
- shape_descriptornumpy array, shape = (12)
Array describing shape of molecule
-
oddt.shape.
electroshape
(mol)[source]¶ Computes shape descriptor based on Armstrong, M. S. et al. ElectroShape: fast molecular similarity calculations incorporating shape, chirality and electrostatics. J Comput Aided Mol Des 24, 789-801 (2010). http://dx.doi.org/doi:10.1007/s10822-010-9374-0
Aside from spatial coordinates, atoms’ charges are also used as the fourth dimension to describe shape of the molecule.
- Parameters
- moloddt.toolkit.Molecule
Molecule to compute Electroshape descriptor
- Returns
- shape_descriptornumpy array, shape = (15)
Array describing shape of molecule
-
oddt.shape.
usr
(molecule)[source]¶ Computes USR shape descriptor based on Ballester PJ, Richards WG (2007). Ultrafast shape recognition to search compound databases for similar molecular shapes. Journal of computational chemistry, 28(10):1711-23. http://dx.doi.org/10.1002/jcc.20681
- Parameters
- moleculeoddt.toolkit.Molecule
Molecule to compute USR shape descriptor
- Returns
- shape_descriptornumpy array, shape = (12)
Array describing shape of molecule
-
oddt.shape.
usr_cat
(molecule)[source]¶ Computes USRCAT shape descriptor based on Adrian M Schreyer, Tom Blundell (2012). USRCAT: real-time ultrafast shape recognition with pharmacophoric constraints. Journal of Cheminformatics, 2012 4:27. http://dx.doi.org/10.1186/1758-2946-4-27
- Parameters
- moleculeoddt.toolkit.Molecule
Molecule to compute USRCAT shape descriptor
- Returns
- shape_descriptornumpy array, shape = (60)
Array describing shape of molecule
-
oddt.shape.
usr_similarity
(mol1_shape, mol2_shape, ow=1.0, hw=1.0, rw=1.0, aw=1.0, dw=1.0)[source]¶ Computes similarity between molecules
- Parameters
- mol1_shapenumpy array
USR shape descriptor
- mol2_shapenumpy array
USR shape descriptor
- owfloat (default = 1.)
Scaling factor for all atoms Only used for USRCAT, ignored for other types
- hwfloat (default = 1.)
Scaling factor for hydrophobic atoms Only used for USRCAT, ignored for other types
- rwfloat (default = 1.)
Scaling factor for aromatic atoms Only used for USRCAT, ignored for other types
- awfloat (default = 1.)
Scaling factor for acceptors Only used for USRCAT, ignored for other types
- dwfloat (default = 1.)
Scaling factor for donors Only used for USRCAT, ignored for other types
- Returns
- similarityfloat from 0 to 1
Similarity between shapes of molecules, 1 indicates identical molecules
oddt.spatial module¶
Spatial functions included in ODDT Mainly used by other modules, but can be accessed directly.
-
oddt.spatial.
angle
(p1, p2, p3)[source]¶ Returns an angle from a series of 3 points (point #2 is centroid). Angle is returned in degrees.
- Parameters
- p1,p2,p3numpy arrays, shape = [n_points, n_dimensions]
Triplets of points in n-dimensional space, aligned in rows.
- Returns
- anglesnumpy array, shape = [n_points]
Series of angles in degrees
-
oddt.spatial.
angle_2v
(v1, v2)[source]¶ Returns an angle between two vecors.Angle is returned in degrees.
- Parameters
- v1,v2numpy arrays, shape = [n_vectors, n_dimensions]
Pairs of vectors in n-dimensional space, aligned in rows.
- Returns
- anglesnumpy array, shape = [n_vectors]
Series of angles in degrees
-
oddt.spatial.
dihedral
(p1, p2, p3, p4)[source]¶ Returns an dihedral angle from a series of 4 points. Dihedral is returned in degrees. Function distingishes clockwise and antyclockwise dihedrals.
- Parameters
- p1, p2, p3, p4numpy arrays, shape = [n_points, n_dimensions]
Quadruplets of points in n-dimensional space, aligned in rows.
- Returns
- anglesnumpy array, shape = [n_points]
Series of angles in degrees
-
oddt.spatial.
distance
(x, y)[source]¶ Computes distance between each pair of points from x and y.
- Parameters
- xnumpy arrays, shape = [n_x, 3]
Array of poinds in 3D
- ynumpy arrays, shape = [n_y, 3]
Array of poinds in 3D
- Returns
- dist_matrixnumpy arrays, shape = [n_x, n_y]
Distance matrix
-
oddt.spatial.
rmsd
(ref, mol, ignore_h=True, method=None, normalize=False)[source]¶ Computes root mean square deviation (RMSD) between two molecules (including or excluding Hydrogens). No symmetry checks are performed.
- Parameters
- refoddt.toolkit.Molecule object
Reference molecule for the RMSD calculation
- moloddt.toolkit.Molecule object
Query molecule for RMSD calculation
- ignore_hbool (default=False)
Flag indicating to ignore Hydrogen atoms while performing RMSD calculation. This toggle works only with ‘hungarian’ method and without sorting (method=None).
- methodstr (default=None)
The method to be used for atom asignment between ref and mol. None means that direct matching is applied, which is the default behavior. Available methods:
canonize - match heavy atoms using canonical ordering (it forces
ignoring H’s) - hungarian - minimize RMSD using Hungarian algorithm - min_symmetry - makes multiple molecule-molecule matches and finds minimal RMSD (the slowest). Hydrogens are ignored.
- normalizebool (default=False)
Normalize RMSD by square root of rot. bonds
- Returns
- rmsdfloat
RMSD between two molecules
-
oddt.spatial.
rotate
(coords, alpha, beta, gamma)[source]¶ Rotate coords by cerain angle in X, Y, Z. Angles are specified in radians.
- Parameters
- coordsnumpy arrays, shape = [n_points, 3]
Coordinates in 3-dimensional space.
- alpha, beta, gamma: float
Angles to rotate the coordinates along X, Y and Z axis. Angles are specified in radians.
- Returns
- new_coordsnumpy arrays, shape = [n_points, 3]
Rorated coordinates in 3-dimensional space.
oddt.surface module¶
This module generates and does computation with molecular surfaces.
-
oddt.surface.
find_surface_residues
(molecule, max_dist=None, scaling=1.0)[source]¶ Finds residues close to the molecular surface using generate_surface_marching_cubes. Ignores hydrogens and waters present in the molecule.
- Parameters
- moleculeoddt.toolkit.Molecule
Molecule to find surface residues in.
- max_distarray_like, numeric or None (default = None)
Maximum distance from the surface where residues would still be considered close. If None, compares distances to radii of respective atoms.
- scalingfloat (default = 1.0)
Expands the grid in which computation is done by generate_surface_marching_cubes by a factor of scaling. Results in a more accurate representation of the surface, and therefore more accurate computation of distances but increases computation time.
- Returns
- atom_dictnumpy array
An atom_dict containing only the surface residues from the original molecule.
-
oddt.surface.
generate_surface_marching_cubes
(molecule, remove_hoh=False, scaling=1.0, probe_radius=1.4)[source]¶ Generates a molecular surface mesh using the marching_cubes method from scikit-image. Ignores hydrogens present in the molecule.
- Parameters
- moleculeoddt.toolkit.Molecule object
Molecule for which the surface will be generated
- remove_hohbool (default = False)
If True, remove waters from the molecule before generating the surface. Requires molecule.protein to be set to True.
- scalingfloat (default = 1.0)
Expands the grid in which computation is done by a factor of scaling. Results in a more accurate representation of the surface, but increases computation time.
- probe_radiusfloat (default = 1.4)
Radius of a ball used to patch up holes inside the molecule resulting from some molecular distances being larger (usually in protein). Basically reduces the surface to one accesible by other molecules of radius smaller than probe_radius.
- Returns
- vertsnumpy array
Spatial coordinates for mesh vertices.
- facesnumpy array
Faces are defined by referencing vertices from verts.
oddt.utils module¶
Common utilities for ODDT
-
oddt.utils.
check_molecule
(mol, force_protein=False, force_coords=False, non_zero_atoms=False)[source]¶ Universal validator of molecule objects. Usage of positional arguments is allowed only for molecule object, otherwise it is prohibitted (i.e. the order of arguments will change). Desired properties of molecule are validated based on specified arguments. By default only the object type is checked. In case of discrepancy to the specification a ValueError is raised with appropriate message.
New in version 0.6.
- Parameters
- mol: oddt.toolkit.Molecule object
Object to verify
- force_protein: bool (default=False)
Force the molecule to be marked as protein (mol.protein).
- force_coords: bool (default=False)
Force the molecule to have non-zero coordinates.
- non_zero_atoms: bool (default=False)
Check if molecule has at least one atom.
-
oddt.utils.
chunker
(iterable, chunksize=100)[source]¶ Generate chunks from a generator object. If iterable is passed which is not a generator it will be converted to one with iter().
New in version 0.6.
-
oddt.utils.
compose_iter
(iterable, funcs)[source]¶ Chain functions and apply them to iterable, by exhausting the iterable. Functions are executed in the order from funcs.
New in version 0.6.
-
oddt.utils.
is_molecule
(obj)[source]¶ Check whether an object is an oddt.toolkits.{rdk,ob}.Molecule instance.
New in version 0.6.
-
oddt.utils.
is_openbabel_molecule
(obj)[source]¶ Check whether an object is an oddt.toolkits.ob.Molecule instance.
New in version 0.6.
oddt.virtualscreening module¶
ODDT pipeline framework for virtual screening
-
class
oddt.virtualscreening.
virtualscreening
(n_cpu=- 1, verbose=False, chunksize=100)[source]¶ Bases:
object
Virtual Screening pipeline stack
- Parameters
- n_cpu: int (default=-1)
The number of parallel procesors to use
- verbose: bool (default=False)
Verbosity flag for some methods
Methods
apply_filter
(expression[, soft_fail])Filtering method, can use raw expressions (strings to be evaled in if statement, can use oddt.toolkit.Molecule methods, eg. mol.molwt < 500) Currently supported presets: * Lipinski Rule of 5 (‘ro5’ or ‘l5’) * Fragment Rule of 3 (‘ro3’) * PAINS filter (‘pains’).
dock
(engine, protein, *args, **kwargs)Docking procedure.
fetch
()A method to exhaust the pipeline.
load_ligands
(fmt, ligands_file, **kwargs)Loads file with ligands.
score
(function[, protein])Scoring procedure compatible with any scoring function implemented in ODDT and other pickled SFs which are subclasses of oddt.scoring.scorer.
similarity
(method, query[, cutoff, protein])Similarity filter. Supported structural methods:
write
(fmt, filename[, csv_filename])Outputs molecules to a file
write_csv
(csv_filename[, fields, keep_pipe])Outputs molecules to a csv file
-
apply_filter
(expression, soft_fail=0)[source]¶ Filtering method, can use raw expressions (strings to be evaled in if statement, can use oddt.toolkit.Molecule methods, eg. mol.molwt < 500) Currently supported presets:
Lipinski Rule of 5 (‘ro5’ or ‘l5’)
Fragment Rule of 3 (‘ro3’)
PAINS filter (‘pains’)
- Parameters
- expression: string or list of strings
Expresion(s) to be used while filtering.
- soft_fail: int (default=0)
The number of faulures molecule can have to pass filter, aka. soft-fails.
-
dock
(engine, protein, *args, **kwargs)[source]¶ Docking procedure.
- Parameters
- engine: string
Which docking engine to use.
Notes
Additional parameters are passed directly to the engine. Following docking engines are supported:
1. Audodock Vina (
`engine="autodock_vina"`
), seeoddt.docking.autodock_vina
.
-
load_ligands
(fmt, ligands_file, **kwargs)[source]¶ Loads file with ligands.
- Parameters
- file_type: string
Type of molecular file
- ligands_file: string
Path to a file, which is loaded to pipeline
-
score
(function, protein=None, *args, **kwargs)[source]¶ Scoring procedure compatible with any scoring function implemented in ODDT and other pickled SFs which are subclasses of oddt.scoring.scorer.
- Parameters
- function: string
Which scoring function to use.
- protein: oddt.toolkit.Molecule
Default protein to use as reference
Notes
Additional parameters are passed directly to the scoring function.
-
similarity
(method, query, cutoff=0.9, protein=None)[source]¶ - Similarity filter. Supported structural methods:
ift: interaction fingerprints
sift: simple interaction fingerprints
usr: Ultrafast Shape recognition
usr_cat: Ultrafast Shape recognition, Credo Atom Types
electroshape: Electroshape, an USR method including partial charges
- Parameters
- method: string
Similarity method used to compare molecules. Avaiale methods: * ifp - interaction fingerprint (requires a receptor) * sifp - simple interaction fingerprint (requires a receptor) * usr - Ultrafast Shape Reckognition * usr_cat - USR, with CREDO atom types * electroshape - Electroshape, USR with moments representing partial charge
- query: oddt.toolkit.Molecule or list of oddt.toolkit.Molecule
Query molecules to compare the pipeline to.
- cutoff: float
Similarity cutoff for filtering molecules. Any similarity lower than it will be filtered out.
- protein: oddt.toolkit.Molecule (default = None)
Protein for underling method. By default it’s empty, but sturctural fingerprints need one.
-
write
(fmt, filename, csv_filename=None, **kwargs)[source]¶ Outputs molecules to a file
- Parameters
- file_type: string
Type of molecular file
- ligands_file: string
Path to a output file
- csv_filename: string
Optional path to a CSV file
-
write_csv
(csv_filename, fields=None, keep_pipe=False, **kwargs)[source]¶ Outputs molecules to a csv file
- Parameters
- csv_filename: string
Optional path to a CSV file
- fields: list (default None)
List of fields to save in CSV file
- keep_pipe: bool (default=False)
If set to True, the ligand pipe is sustained.