Instagram Chatbot with RAG and Voice Transcription
RAG-powered chatbot over 22 specialized PDFs that handles Instagram DMs 24/7, including voice notes. 89% cheaper than OpenAI Whisper.
24/7
No human intervention
22
PDFs in the knowledge base
−89%
Cost vs OpenAI Whisper
The problem
A sports supplements company with 22 specialized products was handling its Instagram DMs entirely by hand. The catalog demanded detailed technical knowledge: ingredients, dosages, contraindications, recommended combinations.
The team spent 3 to 4 hours a day answering questions that were often the same. Replies took hours. And messages arrived at all hours — as text and, very often, as voice notes.
The real bottleneck: it wasn’t the volume of messages. It was consistency. Every person who replied gave slightly different information. Some messages simply went unanswered.
The solution
I built a chatbot with RAG (Retrieval-Augmented Generation) that lives inside the n8n workflow and uses Supabase as a vector database. The flow works like this:
- An Instagram DM comes in (text or voice note)
- If it’s a voice note, Groq Whisper Large v3 Turbo transcribes it in seconds
- The message (or transcription) enters the RAG pipeline: it’s embedded and searched against the 22 catalog PDFs
- Claude generates a contextual answer, citing the exact product and the relevant page
- The reply goes back to the customer via Chatwoot (which manages the Instagram inbox)
- If the model’s confidence is low, the message is escalated to a human with the context already prepared
The chatbot doesn’t improvise: it only answers with what’s documented. If it doesn’t know, it says so and escalates.
Why Groq instead of OpenAI for transcription
OpenAI Whisper charges per minute of audio. For this volume, the cost ran between $0.006 and $0.012 USD per minute. Groq Whisper Large v3 Turbo runs at $0.04 USD per hour of audio — that’s literally 89% cheaper for the same base model.
The quality is identical because it’s the same Whisper. The difference is Groq’s inference infrastructure, which is also significantly faster.
Results since launch
The system has been in production since November 2025. The team went from 3-4 hours a day handling DMs to under 30 minutes of review (only the escalated cases). Response consistency improved visibly — it no longer depends on who’s on shift.
The client can now scale the catalog (add more PDFs) without touching code. The system updates itself: upload the new document to Supabase and re-run the embeddings pipeline.
Detailed tech stack
| Component | Technology | Function |
|---|---|---|
| Orchestration | n8n | Main workflow, routing, escalation |
| Vector store | Supabase (pgvector) | Storage and semantic search |
| Transcription | Groq Whisper Large v3 Turbo | Voice note → text |
| Generation | Claude (Anthropic) | Contextual RAG response |
| Messaging | Twilio | WhatsApp API (fallback) |
| Omnichannel inbox | Chatwoot | Instagram inbox management and escalation |
Want something like this for your business?