Search
 
SCRIPT & CODE EXAMPLE
 

PYTHON

tanimoto coefficient rdkit

from rdkit import Chem
from rdkit import DataStructs
from rdkit.Chem.Fingerprints import FingerprintMols
import pandas as pd

# read and Conconate the csv's
df_1 = pd.read_csv('first.csv')
df_2 = pd.read_csv('second.csv')
df_3 = pd.concat([df_1, df_2])

# proof and make a list of SMILES
df_smiles = df_3['smiles']
c_smiles = []
for ds in df_smiles:
    try:
        cs = Chem.CanonSmiles(ds)
        c_smiles.append(cs)
    except:
        print('Invalid SMILES:', ds)
print()

# make a list of mols
ms = [Chem.MolFromSmiles(x) for x in c_smiles]

# make a list of fingerprints (fp)
fps = [FingerprintMols.FingerprintMol(x) for x in ms]

# the list for the dataframe
qu, ta, sim = [], [], []

# compare all fp pairwise without duplicates
for n in range(len(fps)-1): # -1 so the last fp will not be used
    s = DataStructs.BulkTanimotoSimilarity(fps[n], fps[n+1:]) # +1 compare with the next to the last fp
    print(c_smiles[n], c_smiles[n+1:]) # witch mol is compared with what group
    # collect the SMILES and values
    for m in range(len(s)):
        qu.append(c_smiles[n])
        ta.append(c_smiles[n+1:][m])
        sim.append(s[m])
print()

# build the dataframe and sort it
d = {'query':qu, 'target':ta, 'Similarity':sim}
df_final = pd.DataFrame(data=d)
df_final = df_final.sort_values('Similarity', ascending=False)
print(df_final)

# save as csv
df_final.to_csv('third.csv', index=False, sep=',')
Comment

PREVIOUS NEXT
Code Example
Python :: line to curve dynamo revit 
Python :: how do i select a range of columns by index 
Python :: keep calm and carry on memes 
Python :: python which __divs__ are there 
Python :: how to remove zero after decimal float python 
Python :: qmenu hide python 
Python :: loading model 
Python :: python to pseudo code converter 
Python :: url python 
Python :: what does << do in python 
Python :: print next line 
Python :: shape of a dataframe 
Python :: temporary table pyspark 
Python :: how to get user input in python 
Python :: how to make dice roll in python 
Python :: login python code 
Python :: python formatting string 
Python :: python < 
Python :: python oneline if 
Python :: random forest 
Python :: float and int difference 
Python :: how to sort nested list in python 
Python :: list slicing reverse python 
Python :: python conditional statement 
Python :: what are arrays in python 
Python :: upload image to s3 python 
Python :: pygame draw square 
Python :: re python 
Python :: how to update image in django 
Python :: pandas add prefix to column names 
ADD CONTENT
Topic
Content
Source link
Name
4+6 =