Hacker Newsnew | past | comments | ask | show | jobs | submit | Nash0x7e2's commentslogin

Looks awesome! I downloaded the app and was able to get it connected to my accounts, and it is working.

However, I did notice that connect took around a minute to a minute and a half before the agent was in the call and able to speak. Is this a byproduct of the underlying calling service you're using or the traffic?

Regardless, awesome app, curious to see how it continues to improve!


Sorry for the bad experience. It is because of the sudden traffic. Looking at it.


Built a demo using Gemini Live and Ultralytic's YOLO models running on Stream's Video API for real-time feedback. In this example, I'm having the LLM provide feedback to the player as they try to improve their form.

On the backend, it uses Stream's Python SDK to capture the WebRTC frames from the player, send them to YOLO to detect their arms and body, and then feed them to the Gemini Live API. Once we have a response from Gemini, the audio output is encoded and sent directly to the call, where the user can hear and respond.


Built a demo around integrating Gemini Live with Stream's Video API for agent use-cases. In this example, I'm having the LLM provide feedback to players as they try to improve their mini-golf swing.

On the backend, it uses the Python AI SDK to capture the WebRTC frames from the player, convert them, and then feed them to the Gemini Live API. Once we have a response from Gemini, the audio output is encoded and sent directly to the call, where the user can hear and respond.

Is anyone else building apps around AI and real-time voice/video? Would be curious to share notes. If anyone is interested in trying for themselves:

Python SDK docs: https://getstream.io/video/docs/python-ai/basics/quickstart/ Github: https://github.com/GetStream/stream-py/tree/webrtc


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: