That’s a great question! We partner with a number of different transcription providers that use AI to identify different speakers based on the sound of their voice. This prevents all the speakers from a conference room from being bundled together as the same person. We’re also going to be looking to add this functionality to our own transcription service in the coming months.
For internal use cases like recording your own meetings into Google Drive, the native tools work fine.
Where we come in is for companies building products that need to support all of their customers across Zoom, Meet, Teams, Webex, etc. Most enterprises don’t want five different integrations, and native APIs often come with restrictions (like only the organizer being able to access the file, or recordings not being available until after the call).
We already support diarization in the Desktop Recording SDK by capturing the meeting platform’s speaker-change events, so you get a diarized transcript plus precise “speaker started talking” timestamps out of the box. We also support voice-signature diarization via third-party STT providers for participants calling in from the same room
For in-person meetings and audio uploads, this is on our roadmap and in development. More to come on this!
Just to clarify, we’re the infra layer that reliably captures and normalizes meeting data across platforms. The real value for users is what developers build on top: automated analysis, enrichment, and workflows (not the capture itself)
Modern LLMs can power sales coaching, medical scribing, legal review, support QA, and compliance reporting but they need consistent inputs to process. We handle capture/formatting/edge cases so developers can focus on models and UX
I actually agree that it’s become incredibly easy to transcribe conversations using open-source models, and that’s not where Recall adds the most value. The hard part is building the infrastructure that allows you to get real-time access to the raw audio, video, and transcript data directly from the meeting platforms. We abstract all of that away and provide you with a clean interface to access that data. Once you get the data, you could use any of the models that you mentioned to do your own transcription, or transcribe using Recall’s transcription models.
$0.70/hr is our starter rate for low-volume testing. In production, developers will see higher usage and choose to commit to volume and longer-term usage. Because of this, we've seen most teams don’t pay the starter price once they scale beyond early pilots
Enabling transcription/recordings per platform and remembering to record creates user-dependent setup. Also the host often needs to install apps which adds security friction, and you still have to build/maintain separate implementations for Zoom/Meet/Teams which is often a cost that devs don't want to deal with
Instead, we built a single API that can get the same results without the issues mentioned above so you can focus on building the features your users care about
You're right, and I agree that participants should be aware when they’re being recorded
Because consent laws are complex and vary by region and industry, we leave the consent flow to the developer and we provide the tools and guidance to do it correctly. As with our Meeting Bot API, we also urge teams to follow local laws and make recording clearly visible to users