oddt.scoring.functions package

Submodules

oddt.scoring.functions.NNScore module

class oddt.scoring.functions.NNScore.nnscore(protein=None, n_jobs=-1)[source]

Bases: oddt.scoring.scorer

NNScore implementation [1]. Based on Binana descriptors [2] and an ensemble of 20 best scored nerual networks with a hidden layer of 5 nodes. The NNScore predicts binding affinity (pKi/d).

Parameters:
protein : oddt.toolkit.Molecule object

Receptor for the scored ligands

n_jobs: int (default=-1)

Number of cores to use for scoring and training. By default (-1) all cores are allocated.

References

[1](1, 2) Durrant JD, McCammon JA. NNScore 2.0: a neural-network receptor-ligand scoring function. J Chem Inf Model. 2011;51: 2897-2903. doi:10.1021/ci2003889
[2](1, 2) Durrant JD, McCammon JA. BINANA: a novel algorithm for ligand-binding characterization. J Mol Graph Model. 2011;29: 888-893. doi:10.1016/j.jmgm.2011.01.004

Methods

fit(ligands, target, *args, **kwargs) Trains model on supplied ligands and target values
predict(ligands, *args, **kwargs) Predicts values (eg.
predict_ligand(ligand) Local method to score one ligand and update it’s scores.
predict_ligands(ligands) Method to score ligands in a lazy fashion.
save(filename) Saves scoring function to a pickle file.
score(ligands, target, *args, **kwargs) Methods estimates the quality of prediction using model’s default score (accuracy for classification or R^2 for regression)
set_protein(protein) Proxy method to update protein in all relevant places.
gen_training_data  
load  
train  
fit(ligands, target, *args, **kwargs)

Trains model on supplied ligands and target values

Parameters:
ligands: array-like of ligands

Molecules to featurize and feed into the model

target: array-like of shape = [n_samples] or [n_samples, n_outputs]

Ground truth (correct) target values.

gen_training_data(pdbbind_dir, pdbbind_versions=(2007, 2012, 2013, 2014, 2015, 2016), home_dir=None, use_proteins=False)[source]
classmethod load(filename=None, pdbbind_version=2016)[source]

Loads scoring function from a pickle file.

Parameters:
filename: string

Pickle filename

Returns:
sf: scorer-like object

Scoring function object loaded from a pickle

predict(ligands, *args, **kwargs)

Predicts values (eg. affinity) for supplied ligands.

Parameters:
ligands: array-like of ligands

Molecules to featurize and feed into the model

Returns:
predicted: np.array or array of np.arrays of shape = [n_ligands]

Predicted scores for ligands

predict_ligand(ligand)

Local method to score one ligand and update it’s scores.

Parameters:
ligand: oddt.toolkit.Molecule object

Ligand to be scored

Returns:
ligand: oddt.toolkit.Molecule object

Scored ligand with updated scores

predict_ligands(ligands)

Method to score ligands in a lazy fashion.

Parameters:
ligands: iterable of oddt.toolkit.Molecule objects

Ligands to be scored

Returns:
ligand: iterator of oddt.toolkit.Molecule objects

Scored ligands with updated scores

save(filename)

Saves scoring function to a pickle file.

Parameters:
filename: string

Pickle filename

score(ligands, target, *args, **kwargs)

Methods estimates the quality of prediction using model’s default score (accuracy for classification or R^2 for regression)

Parameters:
ligands: array-like of ligands

Molecules to featurize and feed into the model

target: array-like of shape = [n_samples] or [n_samples, n_outputs]

Ground truth (correct) target values.

Returns:
s: float

Quality score (accuracy or R^2) for prediction

set_protein(protein)

Proxy method to update protein in all relevant places.

Parameters:
protein: oddt.toolkit.Molecule object

New default protein

train(home_dir=None, sf_pickle=None, pdbbind_version=2016)[source]

oddt.scoring.functions.PLECscore module

class oddt.scoring.functions.PLECscore.PLECscore(protein=None, n_jobs=-1, version='linear', depth_protein=5, depth_ligand=1, size=65536)[source]

Bases: oddt.scoring.scorer

PLECscore - a novel scoring function based on PLEC fingerprints. The underlying model can be one of:

  • linear regression
  • neural network (dense, 200x200x200)
  • random forest (100 trees)

The scoring function is trained on PDBbind v2016 database and even with linear model outperforms other machine-learning ones in terms of Pearson correlation coefficient on “core set”. For details see PLEC publication. PLECscore predicts binding affinity (pKi/d).

New in version 0.6.

Parameters:
protein : oddt.toolkit.Molecule object

Receptor for the scored ligands

n_jobs: int (default=-1)

Number of cores to use for scoring and training. By default (-1) all cores are allocated.

version: str (default=’linear’)

A version of scoring function (‘linear’, ‘nn’ or ‘rf’) - which model should be used for the scoring function.

depth_protein: int (default=5)

The depth of ECFP environments generated on the protein side of interaction. By default 6 (0 to 5) environments are generated.

depth_ligand: int (default=1)

The depth of ECFP environments generated on the ligand side of interaction. By default 2 (0 to 1) environments are generated.

size: int (default=65536)

The final size of a folded PLEC fingerprint. This setting is not used to limit the data encoded in PLEC fingerprint (for that tune the depths), but only the final lenght. Setting it to too low value will lead to many collisions.

Methods

fit(ligands, target, *args, **kwargs) Trains model on supplied ligands and target values
predict(ligands, *args, **kwargs) Predicts values (eg.
predict_ligand(ligand) Local method to score one ligand and update it’s scores.
predict_ligands(ligands) Method to score ligands in a lazy fashion.
save(filename) Saves scoring function to a pickle file.
score(ligands, target, *args, **kwargs) Methods estimates the quality of prediction using model’s default score (accuracy for classification or R^2 for regression)
set_protein(protein) Proxy method to update protein in all relevant places.
gen_json  
gen_training_data  
load  
train  
fit(ligands, target, *args, **kwargs)

Trains model on supplied ligands and target values

Parameters:
ligands: array-like of ligands

Molecules to featurize and feed into the model

target: array-like of shape = [n_samples] or [n_samples, n_outputs]

Ground truth (correct) target values.

gen_json(home_dir=None, pdbbind_version=2016)[source]
gen_training_data(pdbbind_dir, pdbbind_versions=(2016, ), home_dir=None, use_proteins=True)[source]
classmethod load(filename=None, version='linear', pdbbind_version=2016, depth_protein=5, depth_ligand=1, size=65536)[source]

Loads scoring function from a pickle file.

Parameters:
filename: string

Pickle filename

Returns:
sf: scorer-like object

Scoring function object loaded from a pickle

predict(ligands, *args, **kwargs)

Predicts values (eg. affinity) for supplied ligands.

Parameters:
ligands: array-like of ligands

Molecules to featurize and feed into the model

Returns:
predicted: np.array or array of np.arrays of shape = [n_ligands]

Predicted scores for ligands

predict_ligand(ligand)

Local method to score one ligand and update it’s scores.

Parameters:
ligand: oddt.toolkit.Molecule object

Ligand to be scored

Returns:
ligand: oddt.toolkit.Molecule object

Scored ligand with updated scores

predict_ligands(ligands)

Method to score ligands in a lazy fashion.

Parameters:
ligands: iterable of oddt.toolkit.Molecule objects

Ligands to be scored

Returns:
ligand: iterator of oddt.toolkit.Molecule objects

Scored ligands with updated scores

save(filename)

Saves scoring function to a pickle file.

Parameters:
filename: string

Pickle filename

score(ligands, target, *args, **kwargs)

Methods estimates the quality of prediction using model’s default score (accuracy for classification or R^2 for regression)

Parameters:
ligands: array-like of ligands

Molecules to featurize and feed into the model

target: array-like of shape = [n_samples] or [n_samples, n_outputs]

Ground truth (correct) target values.

Returns:
s: float

Quality score (accuracy or R^2) for prediction

set_protein(protein)

Proxy method to update protein in all relevant places.

Parameters:
protein: oddt.toolkit.Molecule object

New default protein

train(home_dir=None, sf_pickle=None, pdbbind_version=2016, ignore_json=False)[source]

oddt.scoring.functions.RFScore module

class oddt.scoring.functions.RFScore.rfscore(protein=None, n_jobs=-1, version=1, spr=0, **kwargs)[source]

Bases: oddt.scoring.scorer

Scoring function implementing RF-Score variants. It predicts the binding affinity (pKi/d) of ligand in a complex utilizng simple descriptors (close contacts of atoms <12A) with sophisticated machine-learning model (random forest). The third variand supplements those contacts with Vina partial scores. For futher details see RF-Score publications v1[Rd9e4db499696-1]_, v2[Rd9e4db499696-2]_, v3[Rd9e4db499696-3]_.

Parameters:
protein : oddt.toolkit.Molecule object

Receptor for the scored ligands

n_jobs: int (default=-1)

Number of cores to use for scoring and training. By default (-1) all cores are allocated.

version: int (default=1)

Scoring function variant. The deault is the simplest one (v1).

spr: int (default=0)

The minimum number of contacts in each pair of atom types in the training set for the column to be included in training. This is a way of removal of not frequent and empty contacts.

References

[1]Ballester PJ, Mitchell JBO. A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking. Bioinformatics. 2010;26: 1169-1175. doi:10.1093/bioinformatics/btq112
[2]Ballester PJ, Schreyer A, Blundell TL. Does a more precise chemical description of protein-ligand complexes lead to more accurate prediction of binding affinity? J Chem Inf Model. 2014;54: 944-955. doi:10.1021/ci500091r
[3]Li H, Leung K-S, Wong M-H, Ballester PJ. Improving AutoDock Vina Using Random Forest: The Growing Accuracy of Binding Affinity Prediction by the Effective Exploitation of Larger Data Sets. Mol Inform. WILEY-VCH Verlag; 2015;34: 115-126. doi:10.1002/minf.201400132

Methods

fit(ligands, target, *args, **kwargs) Trains model on supplied ligands and target values
predict(ligands, *args, **kwargs) Predicts values (eg.
predict_ligand(ligand) Local method to score one ligand and update it’s scores.
predict_ligands(ligands) Method to score ligands in a lazy fashion.
save(filename) Saves scoring function to a pickle file.
score(ligands, target, *args, **kwargs) Methods estimates the quality of prediction using model’s default score (accuracy for classification or R^2 for regression)
set_protein(protein) Proxy method to update protein in all relevant places.
gen_training_data  
load  
train  
fit(ligands, target, *args, **kwargs)

Trains model on supplied ligands and target values

Parameters:
ligands: array-like of ligands

Molecules to featurize and feed into the model

target: array-like of shape = [n_samples] or [n_samples, n_outputs]

Ground truth (correct) target values.

gen_training_data(pdbbind_dir, pdbbind_versions=(2007, 2012, 2013, 2014, 2015, 2016), home_dir=None, use_proteins=False)[source]
classmethod load(filename=None, version=1, pdbbind_version=2016)[source]

Loads scoring function from a pickle file.

Parameters:
filename: string

Pickle filename

Returns:
sf: scorer-like object

Scoring function object loaded from a pickle

predict(ligands, *args, **kwargs)

Predicts values (eg. affinity) for supplied ligands.

Parameters:
ligands: array-like of ligands

Molecules to featurize and feed into the model

Returns:
predicted: np.array or array of np.arrays of shape = [n_ligands]

Predicted scores for ligands

predict_ligand(ligand)

Local method to score one ligand and update it’s scores.

Parameters:
ligand: oddt.toolkit.Molecule object

Ligand to be scored

Returns:
ligand: oddt.toolkit.Molecule object

Scored ligand with updated scores

predict_ligands(ligands)

Method to score ligands in a lazy fashion.

Parameters:
ligands: iterable of oddt.toolkit.Molecule objects

Ligands to be scored

Returns:
ligand: iterator of oddt.toolkit.Molecule objects

Scored ligands with updated scores

save(filename)

Saves scoring function to a pickle file.

Parameters:
filename: string

Pickle filename

score(ligands, target, *args, **kwargs)

Methods estimates the quality of prediction using model’s default score (accuracy for classification or R^2 for regression)

Parameters:
ligands: array-like of ligands

Molecules to featurize and feed into the model

target: array-like of shape = [n_samples] or [n_samples, n_outputs]

Ground truth (correct) target values.

Returns:
s: float

Quality score (accuracy or R^2) for prediction

set_protein(protein)

Proxy method to update protein in all relevant places.

Parameters:
protein: oddt.toolkit.Molecule object

New default protein

train(home_dir=None, sf_pickle=None, pdbbind_version=2016)[source]

Module contents

class oddt.scoring.functions.rfscore(protein=None, n_jobs=-1, version=1, spr=0, **kwargs)[source]

Bases: oddt.scoring.scorer

Scoring function implementing RF-Score variants. It predicts the binding affinity (pKi/d) of ligand in a complex utilizng simple descriptors (close contacts of atoms <12A) with sophisticated machine-learning model (random forest). The third variand supplements those contacts with Vina partial scores. For futher details see RF-Score publications v1[R062ccc3ea4fa-1]_, v2[R062ccc3ea4fa-2]_, v3[R062ccc3ea4fa-3]_.

Parameters:
protein : oddt.toolkit.Molecule object

Receptor for the scored ligands

n_jobs: int (default=-1)

Number of cores to use for scoring and training. By default (-1) all cores are allocated.

version: int (default=1)

Scoring function variant. The deault is the simplest one (v1).

spr: int (default=0)

The minimum number of contacts in each pair of atom types in the training set for the column to be included in training. This is a way of removal of not frequent and empty contacts.

References

[1]Ballester PJ, Mitchell JBO. A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking. Bioinformatics. 2010;26: 1169-1175. doi:10.1093/bioinformatics/btq112
[2]Ballester PJ, Schreyer A, Blundell TL. Does a more precise chemical description of protein-ligand complexes lead to more accurate prediction of binding affinity? J Chem Inf Model. 2014;54: 944-955. doi:10.1021/ci500091r
[3]Li H, Leung K-S, Wong M-H, Ballester PJ. Improving AutoDock Vina Using Random Forest: The Growing Accuracy of Binding Affinity Prediction by the Effective Exploitation of Larger Data Sets. Mol Inform. WILEY-VCH Verlag; 2015;34: 115-126. doi:10.1002/minf.201400132

Methods

fit(ligands, target, *args, **kwargs) Trains model on supplied ligands and target values
predict(ligands, *args, **kwargs) Predicts values (eg.
predict_ligand(ligand) Local method to score one ligand and update it’s scores.
predict_ligands(ligands) Method to score ligands in a lazy fashion.
save(filename) Saves scoring function to a pickle file.
score(ligands, target, *args, **kwargs) Methods estimates the quality of prediction using model’s default score (accuracy for classification or R^2 for regression)
set_protein(protein) Proxy method to update protein in all relevant places.
gen_training_data  
load  
train  
fit(ligands, target, *args, **kwargs)

Trains model on supplied ligands and target values

Parameters:
ligands: array-like of ligands

Molecules to featurize and feed into the model

target: array-like of shape = [n_samples] or [n_samples, n_outputs]

Ground truth (correct) target values.

gen_training_data(pdbbind_dir, pdbbind_versions=(2007, 2012, 2013, 2014, 2015, 2016), home_dir=None, use_proteins=False)[source]
classmethod load(filename=None, version=1, pdbbind_version=2016)[source]

Loads scoring function from a pickle file.

Parameters:
filename: string

Pickle filename

Returns:
sf: scorer-like object

Scoring function object loaded from a pickle

predict(ligands, *args, **kwargs)

Predicts values (eg. affinity) for supplied ligands.

Parameters:
ligands: array-like of ligands

Molecules to featurize and feed into the model

Returns:
predicted: np.array or array of np.arrays of shape = [n_ligands]

Predicted scores for ligands

predict_ligand(ligand)

Local method to score one ligand and update it’s scores.

Parameters:
ligand: oddt.toolkit.Molecule object

Ligand to be scored

Returns:
ligand: oddt.toolkit.Molecule object

Scored ligand with updated scores

predict_ligands(ligands)

Method to score ligands in a lazy fashion.

Parameters:
ligands: iterable of oddt.toolkit.Molecule objects

Ligands to be scored

Returns:
ligand: iterator of oddt.toolkit.Molecule objects

Scored ligands with updated scores

save(filename)

Saves scoring function to a pickle file.

Parameters:
filename: string

Pickle filename

score(ligands, target, *args, **kwargs)

Methods estimates the quality of prediction using model’s default score (accuracy for classification or R^2 for regression)

Parameters:
ligands: array-like of ligands

Molecules to featurize and feed into the model

target: array-like of shape = [n_samples] or [n_samples, n_outputs]

Ground truth (correct) target values.

Returns:
s: float

Quality score (accuracy or R^2) for prediction

set_protein(protein)

Proxy method to update protein in all relevant places.

Parameters:
protein: oddt.toolkit.Molecule object

New default protein

train(home_dir=None, sf_pickle=None, pdbbind_version=2016)[source]
class oddt.scoring.functions.nnscore(protein=None, n_jobs=-1)[source]

Bases: oddt.scoring.scorer

NNScore implementation [1]. Based on Binana descriptors [2] and an ensemble of 20 best scored nerual networks with a hidden layer of 5 nodes. The NNScore predicts binding affinity (pKi/d).

Parameters:
protein : oddt.toolkit.Molecule object

Receptor for the scored ligands

n_jobs: int (default=-1)

Number of cores to use for scoring and training. By default (-1) all cores are allocated.

References

[1](1, 2) Durrant JD, McCammon JA. NNScore 2.0: a neural-network receptor-ligand scoring function. J Chem Inf Model. 2011;51: 2897-2903. doi:10.1021/ci2003889
[2](1, 2) Durrant JD, McCammon JA. BINANA: a novel algorithm for ligand-binding characterization. J Mol Graph Model. 2011;29: 888-893. doi:10.1016/j.jmgm.2011.01.004

Methods

fit(ligands, target, *args, **kwargs) Trains model on supplied ligands and target values
predict(ligands, *args, **kwargs) Predicts values (eg.
predict_ligand(ligand) Local method to score one ligand and update it’s scores.
predict_ligands(ligands) Method to score ligands in a lazy fashion.
save(filename) Saves scoring function to a pickle file.
score(ligands, target, *args, **kwargs) Methods estimates the quality of prediction using model’s default score (accuracy for classification or R^2 for regression)
set_protein(protein) Proxy method to update protein in all relevant places.
gen_training_data  
load  
train  
fit(ligands, target, *args, **kwargs)

Trains model on supplied ligands and target values

Parameters:
ligands: array-like of ligands

Molecules to featurize and feed into the model

target: array-like of shape = [n_samples] or [n_samples, n_outputs]

Ground truth (correct) target values.

gen_training_data(pdbbind_dir, pdbbind_versions=(2007, 2012, 2013, 2014, 2015, 2016), home_dir=None, use_proteins=False)[source]
classmethod load(filename=None, pdbbind_version=2016)[source]

Loads scoring function from a pickle file.

Parameters:
filename: string

Pickle filename

Returns:
sf: scorer-like object

Scoring function object loaded from a pickle

predict(ligands, *args, **kwargs)

Predicts values (eg. affinity) for supplied ligands.

Parameters:
ligands: array-like of ligands

Molecules to featurize and feed into the model

Returns:
predicted: np.array or array of np.arrays of shape = [n_ligands]

Predicted scores for ligands

predict_ligand(ligand)

Local method to score one ligand and update it’s scores.

Parameters:
ligand: oddt.toolkit.Molecule object

Ligand to be scored

Returns:
ligand: oddt.toolkit.Molecule object

Scored ligand with updated scores

predict_ligands(ligands)

Method to score ligands in a lazy fashion.

Parameters:
ligands: iterable of oddt.toolkit.Molecule objects

Ligands to be scored

Returns:
ligand: iterator of oddt.toolkit.Molecule objects

Scored ligands with updated scores

save(filename)

Saves scoring function to a pickle file.

Parameters:
filename: string

Pickle filename

score(ligands, target, *args, **kwargs)

Methods estimates the quality of prediction using model’s default score (accuracy for classification or R^2 for regression)

Parameters:
ligands: array-like of ligands

Molecules to featurize and feed into the model

target: array-like of shape = [n_samples] or [n_samples, n_outputs]

Ground truth (correct) target values.

Returns:
s: float

Quality score (accuracy or R^2) for prediction

set_protein(protein)

Proxy method to update protein in all relevant places.

Parameters:
protein: oddt.toolkit.Molecule object

New default protein

train(home_dir=None, sf_pickle=None, pdbbind_version=2016)[source]
class oddt.scoring.functions.PLECscore(protein=None, n_jobs=-1, version='linear', depth_protein=5, depth_ligand=1, size=65536)[source]

Bases: oddt.scoring.scorer

PLECscore - a novel scoring function based on PLEC fingerprints. The underlying model can be one of:

  • linear regression
  • neural network (dense, 200x200x200)
  • random forest (100 trees)

The scoring function is trained on PDBbind v2016 database and even with linear model outperforms other machine-learning ones in terms of Pearson correlation coefficient on “core set”. For details see PLEC publication. PLECscore predicts binding affinity (pKi/d).

New in version 0.6.

Parameters:
protein : oddt.toolkit.Molecule object

Receptor for the scored ligands

n_jobs: int (default=-1)

Number of cores to use for scoring and training. By default (-1) all cores are allocated.

version: str (default=’linear’)

A version of scoring function (‘linear’, ‘nn’ or ‘rf’) - which model should be used for the scoring function.

depth_protein: int (default=5)

The depth of ECFP environments generated on the protein side of interaction. By default 6 (0 to 5) environments are generated.

depth_ligand: int (default=1)

The depth of ECFP environments generated on the ligand side of interaction. By default 2 (0 to 1) environments are generated.

size: int (default=65536)

The final size of a folded PLEC fingerprint. This setting is not used to limit the data encoded in PLEC fingerprint (for that tune the depths), but only the final lenght. Setting it to too low value will lead to many collisions.

Methods

fit
predict
predict_ligand
predict_ligands
save
score
set_protein
gen_json  
gen_training_data  
load  
train  
fit(ligands, target, *args, **kwargs)

Trains model on supplied ligands and target values

Parameters:
ligands: array-like of ligands

Molecules to featurize and feed into the model

target: array-like of shape = [n_samples] or [n_samples, n_outputs]

Ground truth (correct) target values.

gen_json(home_dir=None, pdbbind_version=2016)[source]
gen_training_data(pdbbind_dir, pdbbind_versions=(2016, ), home_dir=None, use_proteins=True)[source]
classmethod load(filename=None, version='linear', pdbbind_version=2016, depth_protein=5, depth_ligand=1, size=65536)[source]

Loads scoring function from a pickle file.

Parameters:
filename: string

Pickle filename

Returns:
sf: scorer-like object

Scoring function object loaded from a pickle

predict(ligands, *args, **kwargs)

Predicts values (eg. affinity) for supplied ligands.

Parameters:
ligands: array-like of ligands

Molecules to featurize and feed into the model

Returns:
predicted: np.array or array of np.arrays of shape = [n_ligands]

Predicted scores for ligands

predict_ligand(ligand)

Local method to score one ligand and update it’s scores.

Parameters:
ligand: oddt.toolkit.Molecule object

Ligand to be scored

Returns:
ligand: oddt.toolkit.Molecule object

Scored ligand with updated scores

predict_ligands(ligands)

Method to score ligands in a lazy fashion.

Parameters:
ligands: iterable of oddt.toolkit.Molecule objects

Ligands to be scored

Returns:
ligand: iterator of oddt.toolkit.Molecule objects

Scored ligands with updated scores

save(filename)

Saves scoring function to a pickle file.

Parameters:
filename: string

Pickle filename

score(ligands, target, *args, **kwargs)

Methods estimates the quality of prediction using model’s default score (accuracy for classification or R^2 for regression)

Parameters:
ligands: array-like of ligands

Molecules to featurize and feed into the model

target: array-like of shape = [n_samples] or [n_samples, n_outputs]

Ground truth (correct) target values.

Returns:
s: float

Quality score (accuracy or R^2) for prediction

set_protein(protein)

Proxy method to update protein in all relevant places.

Parameters:
protein: oddt.toolkit.Molecule object

New default protein

train(home_dir=None, sf_pickle=None, pdbbind_version=2016, ignore_json=False)[source]