In production

Instagram Chatbot with RAG and Voice Transcription

RAG-powered chatbot over 22 specialized PDFs that handles Instagram DMs 24/7, including voice notes. 89% cheaper than OpenAI Whisper.

24/7

No human intervention

PDFs in the knowledge base

−89%

Cost vs OpenAI Whisper

n8n Supabase Groq Twilio Chatwoot

The problem

A sports supplements company with 22 specialized products was handling its Instagram DMs entirely by hand. The catalog demanded detailed technical knowledge: ingredients, dosages, contraindications, recommended combinations.

The team spent 3 to 4 hours a day answering questions that were often the same. Replies took hours. And messages arrived at all hours — as text and, very often, as voice notes.

The real bottleneck: it wasn’t the volume of messages. It was consistency. Every person who replied gave slightly different information. Some messages simply went unanswered.

The solution

I built a chatbot with RAG (Retrieval-Augmented Generation) that lives inside the n8n workflow and uses Supabase as a vector database. The flow works like this:

An Instagram DM comes in (text or voice note)
If it’s a voice note, Groq Whisper Large v3 Turbo transcribes it in seconds
The message (or transcription) enters the RAG pipeline: it’s embedded and searched against the 22 catalog PDFs
Claude generates a contextual answer, citing the exact product and the relevant page
The reply goes back to the customer via Chatwoot (which manages the Instagram inbox)
If the model’s confidence is low, the message is escalated to a human with the context already prepared

The chatbot doesn’t improvise: it only answers with what’s documented. If it doesn’t know, it says so and escalates.

Why Groq instead of OpenAI for transcription

OpenAI Whisper charges per minute of audio. For this volume, the cost ran between $0.006 and $0.012 USD per minute. Groq Whisper Large v3 Turbo runs at $0.04 USD per hour of audio — that’s literally 89% cheaper for the same base model.

The quality is identical because it’s the same Whisper. The difference is Groq’s inference infrastructure, which is also significantly faster.

Results since launch

The system has been in production since November 2025. The team went from 3-4 hours a day handling DMs to under 30 minutes of review (only the escalated cases). Response consistency improved visibly — it no longer depends on who’s on shift.

The client can now scale the catalog (add more PDFs) without touching code. The system updates itself: upload the new document to Supabase and re-run the embeddings pipeline.

Detailed tech stack

Component	Technology	Function
Orchestration	n8n	Main workflow, routing, escalation
Vector store	Supabase (pgvector)	Storage and semantic search
Transcription	Groq Whisper Large v3 Turbo	Voice note → text
Generation	Claude (Anthropic)	Contextual RAG response
Messaging	Twilio	WhatsApp API (fallback)
Omnichannel inbox	Chatwoot	Instagram inbox management and escalation

Want something like this for your business?

Tell me about your operation. First call is free.

Talk on WhatsApp → How I work ↓