Semantic Search with Python

TL;DR: Semantic search uses embeddings to capture the meaning of text so you retrieve relevant results even when the exact keywords don’t match.

🔍 What is Semantic Search?

Traditional search matches exact words in your query with words in documents. Example: searching “car” won’t return results for “automobile” unless it appears verbatim.

Semantic search goes deeper — it understands the meaning (semantics) of words and queries using NLP and embeddings, so synonyms and context are captured. Example: “best places to eat in Mumbai” returns restaurants even if “eat” isn’t mentioned.

⚙️ How Does It Work?

Convert text into embeddings – numerical vectors representing meaning.
Store embeddings in a vector database (FAISS, Pinecone, Weaviate, Milvus).
Encode the query into an embedding at search time.
Retrieve nearest vectors by similarity (cosine, dot product, L2).

🐍 Can It Be Done with Python?

Absolutely. Here’s a compact end‑to‑end example using sentence-transformers for embeddings and faiss for fast vector search.

# Install dependencies
# pip install sentence-transformers faiss-cpu

from sentence_transformers import SentenceTransformer, util
import faiss
import numpy as np

# 1) Load a pre-trained embedding model
model = SentenceTransformer('all-MiniLM-L6-v2')

# 2) Example documents
documents = [
    "I love playing football",
    "Soccer is a popular sport worldwide",
    "The capital of France is Paris",
    "Python is great for machine learning",
]

# 3) Convert documents into embeddings
doc_embeddings = model.encode(documents, convert_to_tensor=False)

# 4) Build a FAISS index
dimension = doc_embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(np.array(doc_embeddings))

# 5) Encode user query
query = "Which countries play soccer?"
query_embedding = model.encode([query])

# 6) Search in FAISS
k = 2  # top results
D, I = index.search(np.array(query_embedding), k)

# 7) Print results
print("Query:", query)
for idx in I[0]:
    print("Match:", documents[idx])

Expected result: It returns lines related to soccer/football, even though the query never said “football”.

🔑 Useful Python Tools

sentence-transformers – simple API for high‑quality text embeddings.
FAISS – fast, local vector search (Facebook AI Similarity Search).
Pinecone / Weaviate / Milvus – managed or self‑hosted vector DBs.
LangChain – glue for Retrieval‑Augmented Generation (RAG) apps.

← Back to Blog Index