AI Agent for Technical Reviews: A Simple RAG Project

AI Agent Robots

To deepen my understanding of RAG (Retrieval-Augmented Generation), I embarked on a simple project. The goal of this project was to develop an AI agent to assist with engineering technical reviews. Using company data, I analyzed past review requests and results to build an AI agent capable of generating necessary review items for new requests based on historical data.


Overview of the Project(AI Agent for Technical Reviews)

Project Objectives

  • Goal: Automate engineering review requests using an AI agent based on historical data. I want make AI Agent for Technical Reviews
  • Data Used: Company data in CSV format.
  • Tools and Libraries: LangChain module.

1. Data Cleaning and Transformation

CSV Data Cleaning
The raw CSV data contained numerous irrelevant fields. To manage it efficiently, I used DataFrame to filter and retain only the necessary columns. After cleaning, the refined data was saved back into a CSV file.

Data Transformation
I converted the cleaned CSV data into a Document format using LangChain’s CSVLoader. This transformation is critical because the Document format is ideal for storage in Vector DBs and suitable for Embedding processes.

  • Recommended Formats: XML or JSON
    XML and JSON formats are preferable because, unlike the Document format, they include heading information, which improves performance when working with Large Language Models (LLMs).
RAG Project Data Preprocessing
RAG Project Data Loading

2. Generating Embeddings

Model Selection
For embedding, I used the HuggingFace intfloat/multilingual-e5-large-instruct model. This model was chosen for the following reasons:

  1. Excellent performance.
  2. Compatibility with local environments.
  3. Avoidance of external API usage, reducing costs.

Embedding in a Local Environment
Since this project is designed for company use, I utilized a local setup without external dependencies. The current setup runs on CPU as no GPU resources were available.

  • The dataset was relatively small, so the entire embedding process took about one hour.
  • The embedded data was stored in Chroma DB, which supports local storage. For reference, FAISS is another viable alternative for local storage.

Optimizing for Retrieval
Only information required for retrieval was embedded, while additional data will be accessed using metadata from the existing database.

RAG Project Embedding and Vector DB Save

Conclusion

This Simple Project provided hands-on experience with RAG’s foundational structure and practical application. Through data cleaning, embedding creation, and efficient storage in a local environment, I have built a robust starting point for developing an AI agent tailored to the project’s goals.

Future steps include optimizing LLM performance and applying the AI agent to real-world engineering review scenarios.

2 thoughts on “AI Agent for Technical Reviews: A Simple RAG Project”

  1. Pingback: AI Agent for Technical Reviews : A Simple RAG Project(2) - freedoug.com

  2. Pingback: Gamma AI Review: Revolutionizing Presentation Creation AI tool - freedoug

Leave a Comment

Your email address will not be published. Required fields are marked *