Python

Search

sentence transformers

"""
This is a simple application for sentence embeddings: semantic search

We have a corpus with various sentences. Then, for a given query sentence,
we want to find the most similar sentence in this corpus.

This script outputs for various queries the top 5 most similar sentences in the corpus.
"""
from sentence_transformers import SentenceTransformer, util
import torch

embedder = SentenceTransformer('all-MiniLM-L6-v2')

# Corpus with example sentences
corpus = ['A man is eating food.',
          'A man is eating a piece of bread.',
          'The girl is carrying a baby.',
          'A man is riding a horse.',
          'A woman is playing violin.',
          'Two men pushed carts through the woods.',
          'A man is riding a white horse on an enclosed ground.',
          'A monkey is playing drums.',
          'A cheetah is running behind its prey.'
          ]
corpus_embeddings = embedder.encode(corpus, convert_to_tensor=True)

# Query sentences:
queries = ['A man is eating pasta.', 'Someone in a gorilla costume is playing a set of drums.', 'A cheetah chases prey on across a field.']


# Find the closest 5 sentences of the corpus for each query sentence based on cosine similarity
top_k = min(5, len(corpus))
for query in queries:
    query_embedding = embedder.encode(query, convert_to_tensor=True)

    # We use cosine-similarity and torch.topk to find the highest 5 scores
    cos_scores = util.cos_sim(query_embedding, corpus_embeddings)[0]
    top_results = torch.topk(cos_scores, k=top_k)

    print("

======================

")
    print("Query:", query)
    print("
Top 5 most similar sentences in corpus:")

    for score, idx in zip(top_results[0], top_results[1]):
        print(corpus[idx], "(Score: {:.4f})".format(score))

    """
    # Alternatively, we can also use util.semantic_search to perform cosine similarty + topk
    hits = util.semantic_search(query_embedding, corpus_embeddings, top_k=5)
    hits = hits[0]      #Get the hits for the first query
    for hit in hits:
        print(corpus[hit['corpus_id']], "(Score: {:.4f})".format(hit['score']))
    """

Comment

PREVIOUS	NEXT

Code Example
Python :: while loop odd numbers python
Python :: serialize keras model
Python :: pygame.events
Python :: virtualenv
Python :: drop row with duplicate value
Python :: stutter function in python
Python :: get array dimension numpy
Python :: turn columns into one column as list python
Python :: python schedule task every hour
Python :: dataframe fill nan with mode
Python :: polish notation python
Python :: python remove duplicates
Python :: python - find columns that are objects
Python :: python runserver port
Python :: python create dataframe by row
Python :: sha256 python
Python :: mapping with geopandas
Python :: reading a file line by line using a generator
Python :: python dropbox
Python :: serialization in django
Python :: Filter Pandas rows by specific string elements
Python :: binary python
Python :: python format strings
Python :: puppy and sum codechef solution
Python :: if string in lost py
Python :: list reverse method in python
Python :: pandas dataframe add two columns int and string
Python :: random seed generator minecraft
Python :: python how to sum two lists
Python :: pandas datetime to unix timestamp

Search

PYTHON

sentence transformers

ADD CONTENT