RAG Agent using Docling and Weaviate
An LLM chat-like question-answering system with RAG (Retrieval-Augmented Generation) to provide accurate answers from PDF documents. The system leverages Docling to parse and intelligently chunk PDF documents, Weaviate as a vector database to store vectorized chunks, and OpenAI for embeddings and text generation.
Let's build a PDF RAG Agent with:
- PDF Document Processing: Efficiently parses and chunks PDF documents for analysis.
- Vector Storage with Weaviate: Stores and manages vectorized document chunks.
- Docling for Advanced Parsing: Utilizes Docling for intelligent PDF parsing and hybrid chunking.
- OpenAI Integration: Leverages OpenAI for creating embeddings and generating text.
- RAG Pattern for Q&A: Implements Retrieval-Augmented Generation for accurate question answering.
The Steps
api-process-pdfs.step.ts
api-query-rag.step.ts
init-weaviate.step.ts
load-weaviate.step.ts
process-pdfs.step.py
🚀 Features
- PDF document processing and chunking: Efficiently parse and chunk PDF documents.
- Vector storage using Weaviate: Store and manage vectorized document chunks.
- Docling for PDF parsing and hybrid chunking: Uses Docling for advanced document chunking.
- OpenAI integration for embeddings and text generation: Leverage OpenAI for creating embeddings and generating text.
- Question answering using RAG pattern: Retrieval-Augmented Generation for accurate question answering.
📋 Prerequisites
- Node.js v18 or later
- npm or pnpm
- API keys for:
🛠️ Installation
-
Clone the repository:
-
Install dependencies:
-
Configure environment variables:
Update
.env
with your API keys:
🏗️ Architecture
🏗️ Technologies
- TypeScript
- Python
- Docling
- Weaviate
- OpenAI
🚦 API Endpoints
Process PDFs
Response:
Query RAG System
Response:
Error Response:
🏃♂️ Running the Application
-
Start the development server:
-
Access the Motia Workbench:
-
Make test requests:
🙏 Acknowledgments
- Motia Framework for the event-driven workflow engine
- Docling for PDF parsing and hybrid chunking
- Weaviate for Vector Database
- OpenAI for AI analysis
Need help? See our Community Resources for questions, examples, and discussions.