After doing a bunch of projects for everything but AI Voice Agents, I finally gave into the hype.
Some weeks ago, I showcased the final demo for a custom-coded AI Voice System made for one of the biggest IoT Companies in Spain, and they seemed to love it!
Now I know: the ticket is not that high, specially for such a big company. But, as stated, this is the first time I dive away from RAG systems & other solutions, and go into AI Voice Agents instead.
And I didn’t want to disappoint! (and no, sadly I couldn't make it a recurring subscription).
Managed to get this deal thanks to a professor from a University in Spain who leads research projects in collaboration with this big IoT company, and this should be the first (and cheapest) phase of the project. We're now heading into automated document generations using OpenAI + Google Docs (for URDs, SRDs, etc.)
Fully aware this is not the most impressive deal, but we're building things up slowly, there's more to come.
-----------------------------------------------------------------------
Project info:
📌 What is it?
Custom-built AI voice assistant that initiates interactive, outbound calls at the time desired by the user. It calls the clients from the IoT company who want to develop software projects with them and gathers all the necessary information to get started with the project and create the necessary documents (URDs, SRDs…).
⭐️ What does it solve?
What they have to do without the voice agent is: fly out an employee to wherever the customer is located, and speak with them directly to gather all information they require to get started developing the software. An incredibly expensive and time-consuming task for the business.
Now they can just send a form to the customer, they fill in their information + desired time for the call (instantly or at a set date and time). Then the LLM will go through a series of questions set by the company, and it will keep asking until it gathers all the necessary data from the customer.
📚 How does it work?
(Quick Demo attached to this post)
⚙️ Workflow
- Client Interaction: Client submits a Google Form, which captures their initial information and preferred call timing.
- Call Initiation: The AI voice assistant, triggered by the form submission, initiates the outbound call at the chosen time.
- Data Collection: The assistant uses dynamic questions to gather all details.
- Data Storage: Transcriptions and call logs are automatically stored in Google Sheets, making the information easily accessible for the IoT company’s project team.
✅ Features
- Dynamic call tracking on Google Sheets: all call info is updated in real time on there, knowing if a call is scheduled, in progress, completed or failed + detailed logs.
- Multiple Calling Attempts: client is busy with another call? no problemo, the system will complete up to 4 calling attempts at different times until it reaches the user.
- Human-like interactions: simple stuff like background office noises, pre-set phrases like ‘hello?’ or ‘are you still there?’ in case the customer doesn’t reply for a prolonged period of time, and a powerful LLM to make the experience feel as natural and fun as possible.
- Full call recording + transcription
- Easily change & improve the set of questions made by the assistant by modifying the prompt
🛠️ Tools used
- Custom coded in Python to optimize flexibility, scalability and reduce costs.
- VAPI: Utilized for transient voice agent handling, integrating capabilities like GPT-4o, 11labs turbo for human-like conversations, and transcriptions. (initially tried OpenAI’s Realtime API, not ideal though)
- Redis: For temporary storage of call data.
- Google Forms: Used to collect initial client data and trigger AI workflow.
- Google Sheets: Serves as the database for storing transcriptions and call logs.
- Twilio: Handles the actual outbound calls.
- Railway: Used for hosting, providing a scalable solution for the assistant’s deployment.