oddt.toolkits package¶
Subpackages¶
Submodules¶
oddt.toolkits.common module¶
Code common to all toolkits
-
oddt.toolkits.common.
canonize_ring_path
(path)[source]¶ Make a canonic path - list of consecutive atom IDXs bonded in a ring sorted in an uniform fasion.
Move the smallest index to position 0
Look for the smallest first step (delta IDX)
Ff -1 is smallest, inverse the path and move min IDX to position 0
- Parameters
- pathlist of integers
A list of consecutive atom indices in a ring
- Returns
- canonic_pathlist of integers
Sorted list of atoms
oddt.toolkits.ob module¶
-
class
oddt.toolkits.ob.
Atom
(OBAtom)[source]¶ Bases:
pybel.Atom
- Attributes
- atomicmass
- atomicnum
- bonds
- cidx
- coordidx
- coords
- exactmass
- formalcharge
- heavyvalence
- heterovalence
- hyb
idx
DEPRECATED: RDKit is 0-based and OpenBabel is 1-based.
idx0
Note that this index is 0-based and OpenBabel’s internal index in 1-based.
idx1
Note that this index is 1-based as OpenBabel’s internal index.
- implicitvalence
- isotope
- neighbors
- partialcharge
- residue
- spin
- type
- valence
- vector
-
property
bonds
¶
-
property
idx
¶ DEPRECATED: RDKit is 0-based and OpenBabel is 1-based. State which convention you desire and use idx0 or idx1.
Note that this index is 1-based as OpenBabel’s internal index.
-
property
idx0
¶ Note that this index is 0-based and OpenBabel’s internal index in 1-based. Changed to be compatible with RDKit
-
property
idx1
¶ Note that this index is 1-based as OpenBabel’s internal index.
-
property
neighbors
¶
-
property
residue
¶
-
class
oddt.toolkits.ob.
Bond
(OBBond)[source]¶ Bases:
object
- Attributes
- atoms
- isrotor
- order
-
property
atoms
¶
-
property
isrotor
¶
-
property
order
¶
-
class
oddt.toolkits.ob.
Fingerprint
(fingerprint)[source]¶ Bases:
pybel.Fingerprint
- Attributes
- bits
- raw
-
property
raw
¶
-
class
oddt.toolkits.ob.
Molecule
(OBMol=None, source=None, protein=False)[source]¶ Bases:
pybel.Molecule
- Attributes
- OBMol
- atom_dict
- atoms
- bonds
canonic_order
Returns np.array with canonic order of heavy atoms in the molecule
- charge
- charges
- clone
- conformers
- coords
- data
- dim
- energy
- exactmass
- formula
- molwt
num_rotors
Number of strict rotatable
protein
A flag for identifing the protein molecules, for which atom_dict procedures may differ.
- res_dict
- residues
- ring_dict
- smiles
- spin
- sssr
- title
- unitcell
Methods
addh
([only_polar])Add hydrogens
calccharges
([model])Calculate partial charges for a molecule.
calcdesc
([descnames])Calculate descriptor values.
calcfp
([fptype])Calculate a molecular fingerprint.
convertdbonds
()Convert Dative Bonds.
draw
([show, filename, update, usecoords])Create a 2D depiction of the molecule.
localopt
([forcefield, steps])Locally optimize the coordinates.
make2D
()Generate 2D coordinates for molecule
make3D
([forcefield, steps])Generate 3D coordinates
removeh
()Remove hydrogens
write
([format, filename, overwrite, opt, size])Write the molecule to a file or return a string.
clone_coords
-
property
OBMol
¶
-
property
atom_dict
¶
-
property
atoms
¶
-
property
bonds
¶
-
calccharges
(model='gasteiger')[source]¶ Calculate partial charges for a molecule. By default the Gasteiger charge model is used.
- Parameters
- modelstr (default=”gasteiger”)
Method for generating partial charges. Supported models: * gasteiger * mmff94 * others supported by OpenBabel (obabel -L charges)
-
property
canonic_order
¶ Returns np.array with canonic order of heavy atoms in the molecule
-
property
charges
¶
-
property
clone
¶
-
property
coords
¶
-
property
num_rotors
¶ Number of strict rotatable
-
property
protein
¶ A flag for identifing the protein molecules, for which atom_dict procedures may differ.
-
property
res_dict
¶
-
property
residues
¶
-
property
ring_dict
¶
-
property
smiles
¶
-
write
(format='smi', filename=None, overwrite=False, opt=None, size=None)[source]¶ Write the molecule to a file or return a string.
- Optional parameters:
- format – see the informats variable for a list of available
output formats (default is “smi”)
filename – default is None overwite – if the output file already exists, should it
be overwritten? (default is False)
- opt – a dictionary of format specific options
For format options with no parameters, specify the value as None.
If a filename is specified, the result is written to a file. Otherwise, a string is returned containing the result.
To write multiple molecules to the same file you should use the Outputfile class.
-
class
oddt.toolkits.ob.
MoleculeData
(obmol)[source]¶ Bases:
pybel.MoleculeData
Methods
clear
has_key
items
iteritems
keys
to_dict
update
values
-
class
oddt.toolkits.ob.
Outputfile
(format, filename, overwrite=False, opt=None)[source]¶ Bases:
pybel.Outputfile
Methods
close
()Close the Outputfile to further writing.
write
(molecule)Write a molecule to the output file.
-
class
oddt.toolkits.ob.
Residue
(OBResidue)[source]¶ Bases:
object
Represent a Pybel residue.
- Required parameter:
OBResidue – an Open Babel OBResidue
- Attributes:
atoms, idx, name.
(refer to the Open Babel library documentation for more info).
- The original Open Babel atom can be accessed using the attribute:
OBResidue
- Attributes
-
property
atoms
¶ List of Atoms in the Residue
-
property
chain
¶ Resdiue chain ID
-
property
idx
¶ DEPRECATED: Use idx0 instead.
Internal index (0-based) of the Residue
-
property
idx0
¶ Internal index (0-based) of the Residue
-
property
name
¶ Residue name
-
property
number
¶ Residue number
-
class
oddt.toolkits.ob.
Smarts
(smartspattern)[source]¶ Bases:
pybel.Smarts
Initialise with a SMARTS pattern.
Methods
findall
(molecule[, unique])Find all matches of the SMARTS pattern to a particular molecule
match
(molecule)Checks if there is any match.
-
oddt.toolkits.ob.
diverse_conformers_generator
(mol, n_conf=10, method='confab', seed=None, **kwargs)[source]¶ Produce diverse conformers using current conformer as starting point. Returns a generator. Each conformer is a copy of original molecule object.
New in version 0.6.
- Parameters
- moloddt.toolkit.Molecule object
Molecule for which generating conformers
- n_confint (default=10)
Targer number of conformers
- methodstring (default=’confab’)
Method for generating conformers. Supported methods: * confab * ga
- seedNone or int (default=None)
Random seed
- mutabilityint (default=5)
The inverse of probability of mutation. By default 5, which translates to 1/5 (20%) chance of mutation. This setting only works with genetic algorithm method (“ga”).
- convergenceint (default=5)
The number of generations with unchanged fitness, should the algorothm converge. This setting only works with genetic algorithm method (“ga”).
- rmsdfloat (default=0.5)
The conformers are pruned unless their RMSD is higher than this cutoff. This setting only works with Confab method (“confab”).
- nconfint (default=10000)
The number of initial conformers to generate before energy pruning. This setting only works with Confab method (“confab”).
- energy_gapfloat (default=5000.)
Energy gap from the lowest energy conformer to the highest possible. This setting only works with Confab method (“confab”).
- Returns
- molslist of oddt.toolkit.Molecule objects
Molecules with diverse conformers
oddt.toolkits.rdk module¶
rdkit - A Cinfony module for accessing the RDKit from CPython
- Global variables:
Chem and AllChem - the underlying RDKit Python bindings informats - a dictionary of supported input formats outformats - a dictionary of supported output formats descs - a list of supported descriptors fps - a list of supported fingerprint types forcefields - a list of supported forcefields
-
class
oddt.toolkits.rdk.
Atom
(Atom)[source]¶ Bases:
object
Represent an rdkit Atom.
- Required parameters:
Atom – an RDKit Atom
- Attributes:
atomicnum, coords, formalcharge
- The original RDKit Atom can be accessed using the attribute:
Atom
- Attributes
-
property
atomicnum
¶
-
property
bonds
¶
-
property
coords
¶
-
property
formalcharge
¶
-
property
idx
¶ DEPRECATED: RDKit is 0-based and OpenBabel is 1-based. State which convention you desire and use idx0 or idx1.
- Note that this index is 1-based and RDKit’s internal index in 0-based.
Changed to be compatible with OpenBabel
-
property
idx0
¶ Note that this index is 0-based as RDKit’s
-
property
idx1
¶ Note that this index is 1-based and RDKit’s internal index in 0-based. Changed to be compatible with OpenBabel
-
property
neighbors
¶
-
property
partialcharge
¶
-
class
oddt.toolkits.rdk.
Bond
(Bond)[source]¶ Bases:
object
- Attributes
- atoms
- isrotor
- order
-
property
atoms
¶
-
property
isrotor
¶
-
property
order
¶
-
class
oddt.toolkits.rdk.
Fingerprint
(fingerprint)[source]¶ Bases:
object
A Molecular Fingerprint.
- Required parameters:
fingerprint – a vector calculated by one of the fingerprint methods
- Attributes:
fp – the underlying fingerprint object bits – a list of bits set in the Fingerprint
- Methods:
The “|” operator can be used to calculate the Tanimoto coeff. For example, given two Fingerprints ‘a’, and ‘b’, the Tanimoto coefficient is given by:
tanimoto = a | b
- Attributes
- raw
-
property
raw
¶
-
class
oddt.toolkits.rdk.
Molecule
(Mol=- 1, source=None, *args, **kwargs)[source]¶ Bases:
object
Trap RDKit molecules which are ‘None’
- Attributes
- Mol
- atom_dict
- atoms
- bonds
canonic_order
Returns np.array with canonic order of heavy atoms in the molecule
- charges
- clone
- coords
- data
- formula
- molwt
- num_rotors
protein
A flag for identifing the protein molecules, for which atom_dict procedures may differ.
- res_dict
- residues
- ring_dict
- smiles
- sssr
- title
Methods
addh
([only_polar])Add hydrogens.
calccharges
([model])Calculate partial charges for a molecule.
calcdesc
([descnames])Calculate descriptor values.
calcfp
([fptype, opt])Calculate a molecular fingerprint.
localopt
([forcefield, steps])Locally optimize the coordinates.
make2D
()Generate 2D coordinates for molecule
make3D
([forcefield, steps])Generate 3D coordinates.
removeh
(**kwargs)Remove hydrogens.
write
([format, filename, overwrite, size])Write the molecule to a file or return a string.
clone_coords
-
property
Mol
¶
-
property
atom_dict
¶
-
property
atoms
¶
-
property
bonds
¶
-
calccharges
(model='gasteiger')[source]¶ Calculate partial charges for a molecule. By default the Gasteiger charge model is used.
- Parameters
- modelstr (default=”gasteiger”)
Method for generating partial charges. Supported models: * gasteiger * mmff94
-
calcdesc
(descnames=None)[source]¶ Calculate descriptor values.
- Optional parameter:
descnames – a list of names of descriptors
If descnames is not specified, all available descriptors are calculated. See the descs variable for a list of available descriptors.
-
calcfp
(fptype='rdkit', opt=None)[source]¶ Calculate a molecular fingerprint.
- Optional parameters:
- fptype – the fingerprint type (default is “rdkit”). See the
fps variable for a list of of available fingerprint types.
- opt – a dictionary of options for fingerprints. Currently only used
for radius and bitInfo in Morgan fingerprints.
-
property
canonic_order
¶ Returns np.array with canonic order of heavy atoms in the molecule
-
property
charges
¶
-
property
clone
¶
-
property
coords
¶
-
property
data
¶
-
property
formula
¶
-
localopt
(forcefield='uff', steps=500)[source]¶ Locally optimize the coordinates.
- Optional parameters:
- forcefield – default is “uff”. See the forcefields variable
for a list of available forcefields.
steps – default is 500
If the molecule does not have any coordinates, make3D() is called before the optimization.
-
make3D
(forcefield='mmff94', steps=50)[source]¶ Generate 3D coordinates.
- Optional parameters:
- forcefield – default is “uff”. See the forcefields variable
for a list of available forcefields.
steps – default is 50
Once coordinates are generated, a quick local optimization is carried out with 50 steps and the UFF forcefield. Call localopt() if you want to improve the coordinates further.
-
property
molwt
¶
-
property
num_rotors
¶
-
property
protein
¶ A flag for identifing the protein molecules, for which atom_dict procedures may differ.
-
property
res_dict
¶
-
property
residues
¶
-
property
ring_dict
¶
-
property
smiles
¶
-
property
sssr
¶
-
property
title
¶
-
write
(format='smi', filename=None, overwrite=False, size=None, **kwargs)[source]¶ Write the molecule to a file or return a string.
- Optional parameters:
- format – see the informats variable for a list of available
output formats (default is “smi”)
filename – default is None overwite – if the output file already exists, should it
be overwritten? (default is False)
If a filename is specified, the result is written to a file. Otherwise, a string is returned containing the result.
To write multiple molecules to the same file you should use the Outputfile class.
-
class
oddt.toolkits.rdk.
MoleculeData
(Mol)[source]¶ Bases:
object
Store molecule data in a dictionary-type object
- Required parameters:
Mol – an RDKit Mol
Methods and accessor methods are like those of a dictionary except that the data is retrieved on-the-fly from the underlying Mol.
Example: >>> mol = next(readfile(“sdf”, ‘head.sdf’)) >>> data = mol.data >>> print(data) {‘Comment’: ‘CORINA 2.61 0041 25.10.2001’, ‘NSC’: ‘1’} >>> print(len(data), data.keys(), data.has_key(“NSC”)) 2 [‘Comment’, ‘NSC’] True >>> print(data[‘Comment’]) CORINA 2.61 0041 25.10.2001 >>> data[‘Comment’] = ‘This is a new comment’ >>> for k,v in data.items(): … print(k, “–>”, v) Comment –> This is a new comment NSC –> 1 >>> del data[‘NSC’] >>> print(len(data), data.keys(), data.has_key(“NSC”)) 1 [‘Comment’] False
Methods
clear
has_key
items
iteritems
keys
to_dict
update
values
-
class
oddt.toolkits.rdk.
Outputfile
(format, filename, overwrite=False, **kwargs)[source]¶ Bases:
object
Represent a file to which output is to be sent.
- Required parameters:
- format - see the outformats variable for a list of available
output formats
filename
- Optional parameters:
- overwite – if the output file already exists, should it
be overwritten? (default is False)
- Methods:
write(molecule) close()
Methods
close
()Close the Outputfile to further writing.
write
(molecule)Write a molecule to the output file.
-
class
oddt.toolkits.rdk.
Residue
(ParentMol, atom_path, idx=0)[source]¶ Bases:
object
Represent a RDKit residue.
- Required parameter:
ParentMol – Parent molecule (Mol) object path – atoms path of a residue
- Attributes:
atoms, idx, name.
(refer to the Open Babel library documentation for more info).
- The Mol object constucted of residues’ atoms can be accessed using the attribute:
Residue
- Attributes
-
property
atoms
¶ List of Atoms in the Residue
-
property
chain
¶ Resdiue chain ID
-
property
idx
¶ DEPRECATED: Use idx0 instead.
Internal index (0-based) of the Residue
-
property
idx0
¶ Internal index (0-based) of the Residue
-
property
name
¶ Residue name
-
property
number
¶ Residue number
-
class
oddt.toolkits.rdk.
Smarts
(smartspattern)[source]¶ Bases:
object
Initialise with a SMARTS pattern.
Methods
findall
(molecule[, unique])Find all matches of the SMARTS pattern to a particular molecule.
match
(molecule)Find all matches of the SMARTS pattern to a particular molecule.
-
oddt.toolkits.rdk.
base_feature_factory
= <rdkit.Chem.rdMolChemicalFeatures.MolChemicalFeatureFactory object>¶ Global feature factory based on BaseFeatures.fdef
-
oddt.toolkits.rdk.
descs
= ['MaxEStateIndex', 'MinEStateIndex', 'MaxAbsEStateIndex', 'MinAbsEStateIndex', 'qed', 'MolWt', 'HeavyAtomMolWt', 'ExactMolWt', 'NumValenceElectrons', 'NumRadicalElectrons', 'MaxPartialCharge', 'MinPartialCharge', 'MaxAbsPartialCharge', 'MinAbsPartialCharge', 'FpDensityMorgan1', 'FpDensityMorgan2', 'FpDensityMorgan3', 'BCUT2D_MWHI', 'BCUT2D_MWLOW', 'BCUT2D_CHGHI', 'BCUT2D_CHGLO', 'BCUT2D_LOGPHI', 'BCUT2D_LOGPLOW', 'BCUT2D_MRHI', 'BCUT2D_MRLOW', 'BalabanJ', 'BertzCT', 'Chi0', 'Chi0n', 'Chi0v', 'Chi1', 'Chi1n', 'Chi1v', 'Chi2n', 'Chi2v', 'Chi3n', 'Chi3v', 'Chi4n', 'Chi4v', 'HallKierAlpha', 'Ipc', 'Kappa1', 'Kappa2', 'Kappa3', 'LabuteASA', 'PEOE_VSA1', 'PEOE_VSA10', 'PEOE_VSA11', 'PEOE_VSA12', 'PEOE_VSA13', 'PEOE_VSA14', 'PEOE_VSA2', 'PEOE_VSA3', 'PEOE_VSA4', 'PEOE_VSA5', 'PEOE_VSA6', 'PEOE_VSA7', 'PEOE_VSA8', 'PEOE_VSA9', 'SMR_VSA1', 'SMR_VSA10', 'SMR_VSA2', 'SMR_VSA3', 'SMR_VSA4', 'SMR_VSA5', 'SMR_VSA6', 'SMR_VSA7', 'SMR_VSA8', 'SMR_VSA9', 'SlogP_VSA1', 'SlogP_VSA10', 'SlogP_VSA11', 'SlogP_VSA12', 'SlogP_VSA2', 'SlogP_VSA3', 'SlogP_VSA4', 'SlogP_VSA5', 'SlogP_VSA6', 'SlogP_VSA7', 'SlogP_VSA8', 'SlogP_VSA9', 'TPSA', 'EState_VSA1', 'EState_VSA10', 'EState_VSA11', 'EState_VSA2', 'EState_VSA3', 'EState_VSA4', 'EState_VSA5', 'EState_VSA6', 'EState_VSA7', 'EState_VSA8', 'EState_VSA9', 'VSA_EState1', 'VSA_EState10', 'VSA_EState2', 'VSA_EState3', 'VSA_EState4', 'VSA_EState5', 'VSA_EState6', 'VSA_EState7', 'VSA_EState8', 'VSA_EState9', 'FractionCSP3', 'HeavyAtomCount', 'NHOHCount', 'NOCount', 'NumAliphaticCarbocycles', 'NumAliphaticHeterocycles', 'NumAliphaticRings', 'NumAromaticCarbocycles', 'NumAromaticHeterocycles', 'NumAromaticRings', 'NumHAcceptors', 'NumHDonors', 'NumHeteroatoms', 'NumRotatableBonds', 'NumSaturatedCarbocycles', 'NumSaturatedHeterocycles', 'NumSaturatedRings', 'RingCount', 'MolLogP', 'MolMR', 'fr_Al_COO', 'fr_Al_OH', 'fr_Al_OH_noTert', 'fr_ArN', 'fr_Ar_COO', 'fr_Ar_N', 'fr_Ar_NH', 'fr_Ar_OH', 'fr_COO', 'fr_COO2', 'fr_C_O', 'fr_C_O_noCOO', 'fr_C_S', 'fr_HOCCN', 'fr_Imine', 'fr_NH0', 'fr_NH1', 'fr_NH2', 'fr_N_O', 'fr_Ndealkylation1', 'fr_Ndealkylation2', 'fr_Nhpyrrole', 'fr_SH', 'fr_aldehyde', 'fr_alkyl_carbamate', 'fr_alkyl_halide', 'fr_allylic_oxid', 'fr_amide', 'fr_amidine', 'fr_aniline', 'fr_aryl_methyl', 'fr_azide', 'fr_azo', 'fr_barbitur', 'fr_benzene', 'fr_benzodiazepine', 'fr_bicyclic', 'fr_diazo', 'fr_dihydropyridine', 'fr_epoxide', 'fr_ester', 'fr_ether', 'fr_furan', 'fr_guanido', 'fr_halogen', 'fr_hdrzine', 'fr_hdrzone', 'fr_imidazole', 'fr_imide', 'fr_isocyan', 'fr_isothiocyan', 'fr_ketone', 'fr_ketone_Topliss', 'fr_lactam', 'fr_lactone', 'fr_methoxy', 'fr_morpholine', 'fr_nitrile', 'fr_nitro', 'fr_nitro_arom', 'fr_nitro_arom_nonortho', 'fr_nitroso', 'fr_oxazole', 'fr_oxime', 'fr_para_hydroxylation', 'fr_phenol', 'fr_phenol_noOrthoHbond', 'fr_phos_acid', 'fr_phos_ester', 'fr_piperdine', 'fr_piperzine', 'fr_priamide', 'fr_prisulfonamd', 'fr_pyridine', 'fr_quatN', 'fr_sulfide', 'fr_sulfonamd', 'fr_sulfone', 'fr_term_acetylene', 'fr_tetrazole', 'fr_thiazole', 'fr_thiocyan', 'fr_thiophene', 'fr_unbrch_alkane', 'fr_urea']¶ A list of supported descriptors
-
oddt.toolkits.rdk.
diverse_conformers_generator
(mol, n_conf=10, method='etkdg', seed=None, rmsd=0.5)[source]¶ Produce diverse conformers using current conformer as starting point. Each conformer is a copy of original molecule object.
New in version 0.6.
- Parameters
- moloddt.toolkit.Molecule object
Molecule for which generating conformers
- n_confint (default=10)
Targer number of conformers
- methodstring (default=’etkdg’)
Method for generating conformers. Supported methods: “etkdg”, “etdg”, “kdg”, “dg”.
- seedNone or int (default=None)
Random seed
- rmsdfloat (default=0.5)
The minimum RMSD that separates conformers to be ratained (otherwise, they will be pruned).
- Returns
- molslist of oddt.toolkit.Molecule objects
Molecules with diverse conformers
-
oddt.toolkits.rdk.
forcefields
= ['uff', 'mmff94']¶ A list of supported forcefields
-
oddt.toolkits.rdk.
fps
= ['rdkit', 'layered', 'maccs', 'atompairs', 'torsions', 'morgan']¶ A list of supported fingerprint types
-
oddt.toolkits.rdk.
informats
= {'inchi': 'InChI', 'mol': 'MDL MOL file', 'mol2': 'Tripos MOL2 file', 'sdf': 'MDL SDF file', 'smi': 'SMILES'}¶ A dictionary of supported input formats
-
oddt.toolkits.rdk.
outformats
= {'can': 'Canonical SMILES', 'inchi': 'InChI', 'inchikey': 'InChIKey', 'mol': 'MDL MOL file', 'sdf': 'MDL SDF file', 'smi': 'SMILES'}¶ A dictionary of supported output formats
-
oddt.toolkits.rdk.
readfile
(format, filename, lazy=False, opt=None, **kwargs)[source]¶ Iterate over the molecules in a file.
- Required parameters:
- format - see the informats variable for a list of available
input formats
filename
You can access the first molecule in a file using the next() method of the iterator:
mol = next(readfile(“smi”, “myfile.smi”))
- You can make a list of the molecules in a file using:
mols = list(readfile(“smi”, “myfile.smi”))
You can iterate over the molecules in a file as shown in the following code snippet: >>> atomtotal = 0 >>> for mol in readfile(“sdf”, “head.sdf”): … atomtotal += len(mol.atoms) … >>> print(atomtotal) 43