oddt.toolkits package¶
Subpackages¶
Submodules¶
oddt.toolkits.common module¶
Code common to all toolkits
oddt.toolkits.ob module¶
-
class
oddt.toolkits.ob.Atom(OBAtom)[source]¶ Bases:
pybel.AtomAttributes
atomicmassatomicnumbondscidxcoordidxcoordsexactmassformalchargeheavyvalenceheterovalencehybidxDEPRECATED: RDKit is 0-based and OpenBabel is 1-based. idx0Note that this index is 0-based and OpenBabel’s internal index in 1-based. idx1Note that this index is 1-based as OpenBabel’s internal index. implicitvalenceisotopeneighborspartialchargeresiduespintypevalencevector-
atomicmass¶
-
atomicnum¶
-
bonds¶
-
cidx¶
-
coordidx¶
-
coords¶
-
exactmass¶
-
formalcharge¶
-
heavyvalence¶
-
heterovalence¶
-
hyb¶
-
idx¶ DEPRECATED: RDKit is 0-based and OpenBabel is 1-based. State which convention you desire and use idx0 or idx1.
Note that this index is 1-based as OpenBabel’s internal index.
-
idx0¶ Note that this index is 0-based and OpenBabel’s internal index in 1-based. Changed to be compatible with RDKit
-
idx1¶ Note that this index is 1-based as OpenBabel’s internal index.
-
implicitvalence¶
-
isotope¶
-
neighbors¶
-
partialcharge¶
-
residue¶
-
spin¶
-
type¶
-
valence¶
-
vector¶
-
-
class
oddt.toolkits.ob.Bond(OBBond)[source]¶ Bases:
objectAttributes
atomsisrotororder-
atoms¶
-
isrotor¶
-
order¶
-
-
class
oddt.toolkits.ob.Fingerprint(fingerprint)[source]¶ Bases:
pybel.FingerprintAttributes
bitsraw-
bits¶
-
raw¶
-
-
class
oddt.toolkits.ob.Molecule(OBMol=None, source=None, protein=False)[source]¶ Bases:
pybel.MoleculeAttributes
OBMolatom_dictatomsbondscanonic_orderReturns np.array with canonic order of heavy atoms in the molecule chargechargescloneconformerscoordsdatadimenergyexactmassformulamolwtnum_rotorsNumber of strict rotatable proteinA flag for identifing the protein molecules, for which atom_dict procedures may differ. res_dictresiduesring_dictsmilesspinsssrtitleunitcellMethods
addh([only_polar])Add hydrogens calccharges([model])Estimates atomic partial charges in the molecule. calcdesc([descnames])Calculate descriptor values. calcfp([fptype])Calculate a molecular fingerprint. clone_coords(source)convertdbonds()Convert Dative Bonds. draw([show, filename, update, usecoords])Create a 2D depiction of the molecule. localopt([forcefield, steps])Locally optimize the coordinates. make2D()Generate 2D coordinates for molecule make3D([forcefield, steps])Generate 3D coordinates removeh()Remove hydrogens write([format, filename, overwrite, opt, size])-
OBMol¶
-
atom_dict¶
-
atoms¶
-
bonds¶
-
calccharges(model='mmff94')¶ Estimates atomic partial charges in the molecule.
- Optional parameters:
- model – default is “mmff94”. See the charges variable for a list
- of available charge models (in shell, obabel -L charges)
This method populates the partialcharge attribute of each atom in the molecule in place.
-
calcdesc(descnames=[])¶ Calculate descriptor values.
- Optional parameter:
- descnames – a list of names of descriptors
If descnames is not specified, all available descriptors are calculated. See the descs variable for a list of available descriptors.
-
calcfp(fptype='FP2')¶ Calculate a molecular fingerprint.
- Optional parameters:
- fptype – the fingerprint type (default is “FP2”). See the
- fps variable for a list of of available fingerprint types.
-
canonic_order¶ Returns np.array with canonic order of heavy atoms in the molecule
-
charge¶
-
charges¶
-
clone¶
-
conformers¶
-
convertdbonds()¶ Convert Dative Bonds.
-
coords¶
-
data¶
-
dim¶
-
draw(show=True, filename=None, update=False, usecoords=False)¶ Create a 2D depiction of the molecule.
- Optional parameters:
show – display on screen (default is True) filename – write to file (default is None) update – update the coordinates of the atoms to those
determined by the structure diagram generator (default is False)- usecoords – don’t calculate 2D coordinates, just use
- the current coordinates (default is False)
Tkinter and Python Imaging Library are required for image display.
-
energy¶
-
exactmass¶
-
formula¶
-
localopt(forcefield='mmff94', steps=500)¶ Locally optimize the coordinates.
- Optional parameters:
- forcefield – default is “mmff94”. See the forcefields variable
- for a list of available forcefields.
steps – default is 500
If the molecule does not have any coordinates, make3D() is called before the optimization. Note that the molecule needs to have explicit hydrogens. If not, call addh().
-
molwt¶
-
num_rotors¶ Number of strict rotatable
-
protein¶ A flag for identifing the protein molecules, for which atom_dict procedures may differ.
-
res_dict¶
-
residues¶
-
ring_dict¶
-
smiles¶
-
spin¶
-
sssr¶
-
title¶
-
unitcell¶
-
-
class
oddt.toolkits.ob.MoleculeData(obmol)[source]¶ Bases:
pybel.MoleculeDataMethods
clear()has_key(key)items()iteritems()keys()to_dict()update(dictionary)values()-
clear()¶
-
has_key(key)¶
-
items()¶
-
iteritems()¶
-
keys()¶
-
update(dictionary)¶
-
values()¶
-
-
class
oddt.toolkits.ob.Outputfile(format, filename, overwrite=False, opt=None)[source]¶ Bases:
pybel.OutputfileMethods
close()Close the Outputfile to further writing. write(molecule)Write a molecule to the output file. -
close()¶ Close the Outputfile to further writing.
-
write(molecule)¶ Write a molecule to the output file.
- Required parameters:
- molecule
-
-
class
oddt.toolkits.ob.Residue(OBResidue)[source]¶ Bases:
objectRepresent a Pybel residue.
- Required parameter:
- OBResidue – an Open Babel OBResidue
- Attributes:
- atoms, idx, name.
(refer to the Open Babel library documentation for more info).
- The original Open Babel atom can be accessed using the attribute:
- OBResidue
Attributes
atomsidxname-
atoms¶
-
idx¶
-
name¶
-
class
oddt.toolkits.ob.Smarts(smartspattern)[source]¶ Bases:
pybel.SmartsInitialise with a SMARTS pattern.
Methods
findall(molecule[, unique])Find all matches of the SMARTS pattern to a particular molecule match(molecule)Checks if there is any match.
oddt.toolkits.rdk module¶
rdkit - A Cinfony module for accessing the RDKit from CPython
- Global variables:
- Chem and AllChem - the underlying RDKit Python bindings informats - a dictionary of supported input formats outformats - a dictionary of supported output formats descs - a list of supported descriptors fps - a list of supported fingerprint types forcefields - a list of supported forcefields
-
class
oddt.toolkits.rdk.Atom(Atom)[source]¶ Bases:
objectRepresent an rdkit Atom.
- Required parameters:
- Atom – an RDKit Atom
- Attributes:
- atomicnum, coords, formalcharge
- The original RDKit Atom can be accessed using the attribute:
- Atom
Attributes
atomicnumbondscoordsformalchargeidxDEPRECATED: RDKit is 0-based and OpenBabel is 1-based. idx0Note that this index is 0-based as RDKit’s idx1Note that this index is 1-based and RDKit’s internal index in 0-based. neighborspartialcharge-
atomicnum¶
-
bonds¶
-
coords¶
-
formalcharge¶
-
idx¶ DEPRECATED: RDKit is 0-based and OpenBabel is 1-based. State which convention you desire and use idx0 or idx1.
- Note that this index is 1-based and RDKit’s internal index in 0-based.
- Changed to be compatible with OpenBabel
-
idx0¶ Note that this index is 0-based as RDKit’s
-
idx1¶ Note that this index is 1-based and RDKit’s internal index in 0-based. Changed to be compatible with OpenBabel
-
neighbors¶
-
partialcharge¶
-
class
oddt.toolkits.rdk.Bond(Bond)[source]¶ Bases:
objectAttributes
atomsisrotororder-
atoms¶
-
isrotor¶
-
order¶
-
-
class
oddt.toolkits.rdk.Fingerprint(fingerprint)[source]¶ Bases:
objectA Molecular Fingerprint.
- Required parameters:
- fingerprint – a vector calculated by one of the fingerprint methods
- Attributes:
- fp – the underlying fingerprint object bits – a list of bits set in the Fingerprint
- Methods:
The “|” operator can be used to calculate the Tanimoto coeff. For example, given two Fingerprints ‘a’, and ‘b’, the Tanimoto coefficient is given by:
tanimoto = a | b
Attributes
raw-
raw¶
-
class
oddt.toolkits.rdk.Molecule(Mol=None, source=None, protein=False)[source]¶ Bases:
objectTrap RDKit molecules which are ‘None’
Attributes
Molatom_dictatomsbondscanonic_orderReturns np.array with canonic order of heavy atoms in the molecule chargesclonecoordsdataformulamolwtnum_rotorsproteinA flag for identifing the protein molecules, for which atom_dict procedures may differ. res_dictresiduesring_dictsmilessssrtitleMethods
addh([only_polar])Add hydrogens. calcdesc([descnames])Calculate descriptor values. calcfp([fptype, opt])Calculate a molecular fingerprint. clone_coords(source)localopt([forcefield, steps])Locally optimize the coordinates. make2D()Generate 2D coordinates for molecule make3D([forcefield, steps])Generate 3D coordinates. removeh(**kwargs)Remove hydrogens. write([format, filename, overwrite, size])Write the molecule to a file or return a string. -
Mol¶
-
atom_dict¶
-
atoms¶
-
bonds¶
-
calcdesc(descnames=None)[source]¶ Calculate descriptor values.
- Optional parameter:
- descnames – a list of names of descriptors
If descnames is not specified, all available descriptors are calculated. See the descs variable for a list of available descriptors.
-
calcfp(fptype='rdkit', opt=None)[source]¶ Calculate a molecular fingerprint.
- Optional parameters:
- fptype – the fingerprint type (default is “rdkit”). See the
- fps variable for a list of of available fingerprint types.
- opt – a dictionary of options for fingerprints. Currently only used
- for radius and bitInfo in Morgan fingerprints.
-
canonic_order¶ Returns np.array with canonic order of heavy atoms in the molecule
-
charges¶
-
clone¶
-
coords¶
-
data¶
-
formula¶
-
localopt(forcefield='uff', steps=500)[source]¶ Locally optimize the coordinates.
- Optional parameters:
- forcefield – default is “uff”. See the forcefields variable
- for a list of available forcefields.
steps – default is 500
If the molecule does not have any coordinates, make3D() is called before the optimization.
-
make3D(forcefield='mmff94', steps=50)[source]¶ Generate 3D coordinates.
- Optional parameters:
- forcefield – default is “uff”. See the forcefields variable
- for a list of available forcefields.
steps – default is 50
Once coordinates are generated, a quick local optimization is carried out with 50 steps and the UFF forcefield. Call localopt() if you want to improve the coordinates further.
-
molwt¶
-
num_rotors¶
-
protein¶ A flag for identifing the protein molecules, for which atom_dict procedures may differ.
-
res_dict¶
-
residues¶
-
ring_dict¶
-
smiles¶
-
sssr¶
-
title¶
-
write(format='smi', filename=None, overwrite=False, size=None, **kwargs)[source]¶ Write the molecule to a file or return a string.
- Optional parameters:
- format – see the informats variable for a list of available
- output formats (default is “smi”)
filename – default is None overwite – if the output file already exists, should it
be overwritten? (default is False)
If a filename is specified, the result is written to a file. Otherwise, a string is returned containing the result.
To write multiple molecules to the same file you should use the Outputfile class.
-
-
class
oddt.toolkits.rdk.MoleculeData(Mol)[source]¶ Bases:
objectStore molecule data in a dictionary-type object
- Required parameters:
- Mol – an RDKit Mol
Methods and accessor methods are like those of a dictionary except that the data is retrieved on-the-fly from the underlying Mol.
Example: >>> mol = next(readfile(“sdf”, ‘head.sdf’)) >>> data = mol.data >>> print(data) {‘Comment’: ‘CORINA 2.61 0041 25.10.2001’, ‘NSC’: ‘1’} >>> print(len(data), data.keys(), data.has_key(“NSC”)) 2 [‘Comment’, ‘NSC’] True >>> print(data[‘Comment’]) CORINA 2.61 0041 25.10.2001 >>> data[‘Comment’] = ‘This is a new comment’ >>> for k,v in data.items(): … print(k, “–>”, v) Comment –> This is a new comment NSC –> 1 >>> del data[‘NSC’] >>> print(len(data), data.keys(), data.has_key(“NSC”)) 1 [‘Comment’] False
Methods
clear()has_key(key)items()iteritems()keys()to_dict()update(dictionary)values()
-
class
oddt.toolkits.rdk.Outputfile(format, filename, overwrite=False)[source]¶ Bases:
objectRepresent a file to which output is to be sent.
- Required parameters:
- format - see the outformats variable for a list of available
- output formats
filename
- Optional parameters:
- overwite – if the output file already exists, should it
- be overwritten? (default is False)
- Methods:
- write(molecule) close()
Methods
close()Close the Outputfile to further writing. write(molecule)Write a molecule to the output file.
-
class
oddt.toolkits.rdk.Residue(ParentMol, atom_path)[source]¶ Bases:
objectRepresent a RDKit residue.
- Required parameter:
- ParentMol – Parent molecule (Mol) object path – atoms path of a residue
- Attributes:
- atoms, idx, name.
(refer to the Open Babel library documentation for more info).
- The Mol object constucted of residues’ atoms can be accessed using the attribute:
- Residue
Attributes
atomsidxname-
atoms¶
-
idx¶
-
name¶
-
class
oddt.toolkits.rdk.Smarts(smartspattern)[source]¶ Bases:
objectInitialise with a SMARTS pattern.
Methods
findall(molecule[, unique])Find all matches of the SMARTS pattern to a particular molecule. match(molecule)Find all matches of the SMARTS pattern to a particular molecule.
-
oddt.toolkits.rdk.base_feature_factory= <rdkit.Chem.rdMolChemicalFeatures.MolChemicalFeatureFactory object>¶ Global feature factory based on BaseFeatures.fdef
-
oddt.toolkits.rdk.descs= ['fr_C_O_noCOO', 'PEOE_VSA3', 'Chi4v', 'fr_Ar_COO', 'fr_SH', 'Chi4n', 'SMR_VSA10', 'fr_para_hydroxylation', 'fr_barbitur', 'fr_Ar_NH', 'fr_halogen', 'fr_dihydropyridine', 'fr_priamide', 'SlogP_VSA4', 'fr_guanido', 'MinPartialCharge', 'fr_furan', 'fr_morpholine', 'fr_nitroso', 'NumAromaticCarbocycles', 'fr_COO2', 'fr_amidine', 'SMR_VSA7', 'fr_benzodiazepine', 'ExactMolWt', 'fr_Imine', 'MolWt', 'fr_hdrzine', 'fr_urea', 'NumAromaticRings', 'fr_quatN', 'NumSaturatedHeterocycles', 'NumAliphaticHeterocycles', 'fr_benzene', 'fr_phos_acid', 'fr_sulfone', 'VSA_EState10', 'fr_aniline', 'fr_N_O', 'fr_sulfonamd', 'fr_thiazole', 'TPSA', 'EState_VSA8', 'qed', 'PEOE_VSA14', 'PEOE_VSA13', 'PEOE_VSA12', 'PEOE_VSA11', 'PEOE_VSA10', 'BalabanJ', 'fr_lactone', 'fr_Al_COO', 'EState_VSA10', 'EState_VSA11', 'HeavyAtomMolWt', 'fr_nitro_arom', 'Chi0', 'Chi1', 'NumAliphaticRings', 'MolLogP', 'fr_nitro', 'fr_Al_OH', 'fr_azo', 'NumAliphaticCarbocycles', 'fr_C_O', 'fr_ether', 'fr_phenol_noOrthoHbond', 'fr_alkyl_halide', 'NumValenceElectrons', 'fr_aryl_methyl', 'fr_Ndealkylation2', 'MinEStateIndex', 'fr_term_acetylene', 'HallKierAlpha', 'fr_C_S', 'fr_thiocyan', 'fr_ketone_Topliss', 'VSA_EState4', 'Ipc', 'VSA_EState6', 'VSA_EState7', 'VSA_EState1', 'VSA_EState2', 'EState_VSA9', 'fr_HOCCN', 'fr_phos_ester', 'BertzCT', 'SlogP_VSA12', 'VSA_EState9', 'SlogP_VSA10', 'SlogP_VSA11', 'fr_COO', 'NHOHCount', 'fr_unbrch_alkane', 'NumSaturatedRings', 'MaxPartialCharge', 'fr_methoxy', 'fr_thiophene', 'SlogP_VSA8', 'SlogP_VSA9', 'MinAbsPartialCharge', 'SlogP_VSA5', 'SlogP_VSA6', 'SlogP_VSA7', 'SlogP_VSA1', 'SlogP_VSA2', 'SlogP_VSA3', 'NumRadicalElectrons', 'fr_NH2', 'fr_piperzine', 'fr_nitrile', 'NumHeteroatoms', 'fr_NH1', 'fr_NH0', 'MaxAbsEStateIndex', 'LabuteASA', 'fr_amide', 'Chi3n', 'fr_imidazole', 'SMR_VSA3', 'SMR_VSA2', 'SMR_VSA1', 'Chi3v', 'SMR_VSA6', 'Kappa3', 'Kappa2', 'EState_VSA6', 'EState_VSA7', 'SMR_VSA9', 'SMR_VSA8', 'EState_VSA2', 'EState_VSA3', 'fr_Ndealkylation1', 'EState_VSA1', 'fr_ketone', 'SMR_VSA5', 'MinAbsEStateIndex', 'fr_diazo', 'SMR_VSA4', 'fr_Ar_N', 'fr_Nhpyrrole', 'fr_ester', 'VSA_EState5', 'EState_VSA4', 'NumHDonors', 'fr_prisulfonamd', 'fr_oxime', 'EState_VSA5', 'VSA_EState3', 'fr_isocyan', 'Chi2n', 'Chi2v', 'HeavyAtomCount', 'fr_azide', 'NumHAcceptors', 'fr_lactam', 'fr_allylic_oxid', 'VSA_EState8', 'fr_oxazole', 'fr_Ar_OH', 'fr_piperdine', 'FpDensityMorgan2', 'FpDensityMorgan3', 'FpDensityMorgan1', 'fr_sulfide', 'fr_alkyl_carbamate', 'NOCount', 'Chi1n', 'PEOE_VSA8', 'PEOE_VSA7', 'PEOE_VSA6', 'PEOE_VSA5', 'PEOE_VSA4', 'MaxEStateIndex', 'PEOE_VSA2', 'PEOE_VSA1', 'NumSaturatedCarbocycles', 'fr_imide', 'FractionCSP3', 'Chi1v', 'fr_Al_OH_noTert', 'fr_epoxide', 'fr_hdrzone', 'fr_isothiocyan', 'NumAromaticHeterocycles', 'fr_bicyclic', 'Kappa1', 'Chi0n', 'fr_phenol', 'MolMR', 'PEOE_VSA9', 'fr_aldehyde', 'fr_pyridine', 'fr_tetrazole', 'RingCount', 'fr_nitro_arom_nonortho', 'Chi0v', 'fr_ArN', 'NumRotatableBonds', 'MaxAbsPartialCharge']¶ A list of supported descriptors
-
oddt.toolkits.rdk.forcefields= ['mmff94', 'uff']¶ A list of supported forcefields
-
oddt.toolkits.rdk.fps= ['rdkit', 'layered', 'maccs', 'atompairs', 'torsions', 'morgan']¶ A list of supported fingerprint types
-
oddt.toolkits.rdk.informats= {'inchi': 'InChI', 'mol2': 'Tripos MOL2 file', 'sdf': 'MDL SDF file', 'smi': 'SMILES', 'mol': 'MDL MOL file'}¶ A dictionary of supported input formats
-
oddt.toolkits.rdk.outformats= {'inchikey': 'InChIKey', 'sdf': 'MDL SDF file', 'can': 'Canonical SMILES', 'smi': 'SMILES', 'mol': 'MDL MOL file', 'inchi': 'InChI'}¶ A dictionary of supported output formats
-
oddt.toolkits.rdk.readfile(format, filename, lazy=False, opt=None, **kwargs)[source]¶ Iterate over the molecules in a file.
- Required parameters:
- format - see the informats variable for a list of available
- input formats
filename
You can access the first molecule in a file using the next() method of the iterator:
mol = next(readfile(“smi”, “myfile.smi”))- You can make a list of the molecules in a file using:
- mols = list(readfile(“smi”, “myfile.smi”))
You can iterate over the molecules in a file as shown in the following code snippet: >>> atomtotal = 0 >>> for mol in readfile(“sdf”, “head.sdf”): … atomtotal += len(mol.atoms) … >>> print(atomtotal) 43