Python

Search

tanimoto coefficient rdkit

from rdkit import Chem
from rdkit import DataStructs
from rdkit.Chem.Fingerprints import FingerprintMols
import pandas as pd

# read and Conconate the csv's
df_1 = pd.read_csv('first.csv')
df_2 = pd.read_csv('second.csv')
df_3 = pd.concat([df_1, df_2])

# proof and make a list of SMILES
df_smiles = df_3['smiles']
c_smiles = []
for ds in df_smiles:
    try:
        cs = Chem.CanonSmiles(ds)
        c_smiles.append(cs)
    except:
        print('Invalid SMILES:', ds)
print()

# make a list of mols
ms = [Chem.MolFromSmiles(x) for x in c_smiles]

# make a list of fingerprints (fp)
fps = [FingerprintMols.FingerprintMol(x) for x in ms]

# the list for the dataframe
qu, ta, sim = [], [], []

# compare all fp pairwise without duplicates
for n in range(len(fps)-1): # -1 so the last fp will not be used
    s = DataStructs.BulkTanimotoSimilarity(fps[n], fps[n+1:]) # +1 compare with the next to the last fp
    print(c_smiles[n], c_smiles[n+1:]) # witch mol is compared with what group
    # collect the SMILES and values
    for m in range(len(s)):
        qu.append(c_smiles[n])
        ta.append(c_smiles[n+1:][m])
        sim.append(s[m])
print()

# build the dataframe and sort it
d = {'query':qu, 'target':ta, 'Similarity':sim}
df_final = pd.DataFrame(data=d)
df_final = df_final.sort_values('Similarity', ascending=False)
print(df_final)

# save as csv
df_final.to_csv('third.csv', index=False, sep=',')

Comment

PREVIOUS	NEXT

Code Example
Python :: line to curve dynamo revit
Python :: how do i select a range of columns by index
Python :: keep calm and carry on memes
Python :: python which __divs__ are there
Python :: how to remove zero after decimal float python
Python :: qmenu hide python
Python :: loading model
Python :: python to pseudo code converter
Python :: url python
Python :: what does << do in python
Python :: print next line
Python :: shape of a dataframe
Python :: temporary table pyspark
Python :: how to get user input in python
Python :: how to make dice roll in python
Python :: login python code
Python :: python formatting string
Python :: python <
Python :: python oneline if
Python :: random forest
Python :: float and int difference
Python :: how to sort nested list in python
Python :: list slicing reverse python
Python :: python conditional statement
Python :: what are arrays in python
Python :: upload image to s3 python
Python :: pygame draw square
Python :: re python
Python :: how to update image in django
Python :: pandas add prefix to column names

Search

PYTHON

tanimoto coefficient rdkit

ADD CONTENT