Hey everyone!
A few days ago, I posted an article about a Python script I developed that helped me seamlessly transcribe and integrate 3,000+ visual slides full of crucial information into my RAG Knowledgebase—completely on autopilot.
Now, I’ve given the script a full makeover to make it as easy as possible to plug-and-play!
The article took off on Medium, so I knew I had to bring it here for you all to benefit:
This all started because I was developing a RAG chatbot for a client and...
I ran into a pretty big problem.
All the info he provided was in slides full of text & images!
I had to figure out a way to implement these slides into the Knowledgebase (and it had to work).
So I got to work and developed a script to do so on autopilot. It leverages the vision capabilities of the latest LLM models to transcribe the slides accurately—OCR just couldn’t cut it because it ignores the layout and misses out on crucial images.
But that’s not all! The script is loaded with extra features:
- SUMMARIZING & VECTORIZING: The script doesn’t just transcribe; it summarizes key concepts and creates vectors to ensure your Knowledgebase captures everything. These vectors are essential for data integration.
- FOLDER PROCESSING: It processes every subfolder in your directory, so no image gets left behind. Perfect for managing large datasets.
- SMART FILE NAMING: The script updates transcription filenames with vector counts, so you always know where you stand.
- MERGING TRANSCRIPTIONS: You can merge all transcriptions into a single file—whether by folder or into one master file—keeping your data organized and accessible.
- VECTOR COUNTING: Get a quick snapshot of your data volume with vector counts for each main folder—great for ensuring completeness.
- VECTOR UPLOADING: Finally, it uploads all vectors to the Qdrant vector store (but you can switch to another provider with a simple code tweak).
+ free prompt templates included!
Check out the attached video for a quick overview, and dive into the article for a detailed tutorial on how to use the script.
(Oh, and sorry for putting you through the ordeal of watching these videos of me talking—just trying to sharpen my camera skills)