Embedding Beginner Example for RAG

Embedding Concept Image Beginner example

The Role of Embedding in RAG

Embedding Beginner Example: Retrieval-Augmented Generation (RAG) is a method that enhances response generation by retrieving relevant information.
In this process, embedding plays a crucial role in converting documents and queries into vector representations, allowing the system to retrieve the most relevant information efficiently.
Embedding transforms the content of a document into numerical vectors, which helps quantify its meaning and apply it to various NLP tasks. Document embeddings are generated by tokenizing text and feeding it into a model to produce vectorized representations. These embeddings are then used in RAG to retrieve relevant documents based on similarity.

Why is Embedding Important?

  • Efficient Retrieval: Quickly finds similar documents in large datasets.
  • Enhanced Context Understanding: Helps models provide richer and more informative responses.
  • Improved Accuracy: Considers semantic similarity rather than relying only on keyword matching.

Embedding Beginner Example of Different Embedding Models

There are multiple ways to generate embeddings in RAG. In this blog, we will compare and implement three popular methods:
OpenAI Embedding, CacheBacked Embedding, and HuggingFace Embedding.

(1) Using OpenAI Embedding

OpenAI provides the text-embedding-ada-002 model, which can convert text into vector embeddings.

import openai

openai.api_key = "YOUR_OPENAI_API_KEY"

def get_openai_embedding(text):
    response = openai.Embedding.create(
        input=text,
        model="text-embedding-ada-002"
    )
    return response['data'][0]['embedding']

text = "How to generate embeddings"
embedding_vector = get_openai_embedding(text)
print(embedding_vector[:5])  # Print part of the vector

Advantages:

  • High performance and powered by OpenAI’s advanced language models.
  • Supports multiple languages.

Disadvantages:

  • Requires API calls, leading to potential costs.
  • Can be slower than local embedding models.

(2) Using CacheBacked Embedding

CacheBacked Embedding helps reduce repeated API calls by caching the embeddings. It can be easily implemented using langchain.
Embedding generation can be costly, and re-computing embeddings for the same document is inefficient. CacheBacked Embedding optimizes this by avoiding redundant calculations.

from langchain.storage import LocalFileStore
from langchain_openai import OpenAIEmbeddings
from langchain.embeddings import CacheBackedEmbeddings


# OpenAI embedding set
embedding = OpenAIEmbeddings()

# Local file Storage Settting
store = LocalFileStore("./cache/")

# Cacheback embedding setting
cached_embedder = CacheBackedEmbeddings.from_bytes_store(
    underlying_embeddings=embedding,
    document_embedding_cache=store,
    namespace=embedding.model, 
)

Advantages:

  • Reduces API call costs (reuses cached values for the same text).
  • Improves speed.

Disadvantages:

  • Initial caching is required, so the first-time retrieval may not be optimized.

(3) Using HuggingFace Embedding

langchain also supports embedding models from HuggingFace. Unlike OpenAI embeddings, HuggingFace models run locally, which means they do not require external API calls. While they may be slower than OpenAI’s model, they eliminate API costs.
Personally, I prefer using this model as my projects often involve handling sensitive company data, making local models more secure.

from langchain_huggingface.embeddings import HuggingFaceEndpointEmbeddings

model_name = "intfloat/multilingual-e5-large-instruct"

hf_embeddings = HuggingFaceEndpointEmbeddings(
    model=model_name,
    task="feature-extraction",
    huggingfacehub_api_token=os.environ["HUGGINGFACEHUB_API_TOKEN"],
)

Advantages:

  • No API cost as it is open-source.
  • Can run locally, ensuring better security.
  • Fast execution when properly set up.

Disadvantages:

  • Requires additional local resources (CPU/GPU) for execution.
  • May have lower performance compared to OpenAI models.

3. Comparison and Selection Guide for Embedding Beginner

MethodAdvantagesDisadvantages
OpenAI EmbeddingHigh performance, supports multiple languagesAPI cost, slower speed
CacheBacked EmbeddingReduces API cost, improves speedRequires initial caching
HuggingFace EmbeddingFree to use, secure, runs locallyNeeds local setup, may be slower

Recommended Use Cases

  • For quick development & high performance needs → OpenAI Embedding
  • For cost-saving optimization → CacheBacked Embedding
  • For local execution without API costs → HuggingFace Embedding

4. Conclusion & Next Steps

Embedding plays a crucial role in retrieving and processing information in RAG.
In this blog, we explored various embedding techniques using OpenAI, CacheBacked, and HuggingFace models.

Leave a Comment

Your email address will not be published. Required fields are marked *