How to create a video call application with WebRTC

bkanber · on June 15, 2020

I also decided to learn WebRTC and built a video chat app project: https://zonko.chat

The last time I did any p2p networking was back in 2002 or something when you still had to do it all manually. We used all sorts of fun tricks like NAT hole punching, and using little script endpoints to capture and forward along port and public IP address information.

It was fun to see that all of this has since been formalized under the "ICE framework". I was surprised to see that the STUN spec is only 12 years old now, despite the techniques involved being used for at least 20 years, probably more like 30+.

So if anyone who's new to this whole p2p world feels that WebRTC and the ICE framework is confusing or onerous, I would point out that just a short while ago these were basically just a handful of heuristic techniques developed through trial and error over the years. It's really much easier nowadays! zonko.chat only took me 12 or so hours to build (and seems to be well-supported by chrome and ff, even mobile).

Edit: Upon reflection, I don't even remember how I learned about some of them. The concept of TURN was probably one that I, and many thousands of others, invented from scratch due to necessity (failed to punch the hole? fall back to this custom relay I wrote in perl). STUN was an easy one to figure out yourself, too. I don't remember how I learned about hole punching though. Probably a forum or a book. Or possibly just an experiment ("what if the two connections touch somewhere in the internet at the same time... hey wait, it worked?") What's interesting to me is that the core "ICE" concepts (hole punching, STUN, TURN) are still pretty simple even in their mature, formalized, scientific form. But the concept of "SIP" is much more sophisticated today than it was back then.

vxNsr · on June 15, 2020

> The last time I did any p2p networking was back in 2002 or something when you still had to do it all manually.

> It's really much easier nowadays! zonko.chat only took me 12 or so hours to build

While it may have only taken you 12 hours to build in "real time" I'd say you've been "building" it for the last 20 years. If a newbie tried to do this they could expect to spend a few weeks or more on the project would be my guess.

bkanber · on June 15, 2020

That's a good point! I did not need to relearn core concepts.

The bulk of that 12 hours was actually spent debugging negotiation and timing/race issues in the signaling/SIP layer. Even for me, figuring out how the WebRTC API is supposed to work was a little difficult.

vxNsr · on June 16, 2020

I hope I didn't come off as overly aggressive, the comment was mostly for myself, I was feeling badly that there was no way I would be able to do implement something like this in 12 hours.

Often veterans here will talk about projects they trivially did. but unless you dig into their profile and find out who they are it feels like every joe-shmo on hn is a 1/2mil+ SWE at google.

bkanber · on June 16, 2020

No no, that didn't sound aggressive at all! Though I still wouldn't say the project was trivial, just very limited in scope. And had I done everything perfectly -- meaning, being able to stream code from my head with no errors -- the project may only have taken 3 hours. That's how small it is. I think the finished product is 800 lines of code, server and client together. It's conceptually very small too. (Maybe this is a 'veteran' skill as well, being able to keep things small.)

My point is that a whole 75% of the time I spent on this project was just me flailing around with stuff that wasn't working as I expected (that's relatable at all experience levels!). Perhaps it's true that veterans can get through certain things more quickly than novices can, but we're not immune to the 80/20 rule either! We just get stuck on different types of problems.

dvirsky · on June 15, 2020

I worked for a company developing a custom P2P video streaming protocol. It was amazingly hard to get working properly and testing all the quirks in different routers from different manufacturers. The combination of all the NAT strategies (I think there were 5 main ones) created quite a matrix of possible ways that things can work, and you had to implement and test them all before there was any kind of standard way to do it. We built a mock internet with dozens of consumer routers, as many as we could get our hands on. We eventually got it to work and that company exists to this day, but I think they switched to WebRTC long ago.

figers · on June 15, 2020

WebRTC seems easy when you're creating a proof of concept with-in your own network, once you get into complex situations behind firewalls across the internet it's a whole different story.

The article mentioned "This section we will just touch and go about when do you need a TURN server. It is not needed in all situations but a component needed if you have to deal with slightly less straightway use cases." a TURN server is a must in the real world...

hardwaresofton · on June 15, 2020

True, but the problem you're mentioning is solved, for the most part. STUN/TURN servers serve this purpose, there are open source ones[0][1] and you can can even use public STUN servers[2] (as you might expect free TURN servers aren't really a thing).

Solutions like Jitsi Meet[3], whereby (formerly appear.in)[4] use WebRTC to great success and are fantastic for quick meetings.

WebRTC these days is pretty mature and ready for primetime -- TURN servers are a last resort, but they're a small price to pay for something that might be free (to you the service provider) most of the time, if signaling succeeds.

[0]: https://github.com/jselbie/stunserver

[1]: https://github.com/enobufs/stun

[2]: https://gist.github.com/mondain/b0ec1cf5f60ae726202e

[3]: https://meet.jit.si

[4]: https://whereby.com

EGreg · on June 15, 2020

According to Tsahi Levent-Levi, some of your biggest customers (corporation wise) are behind firewalls and need TURN.

This “last resort” is about 10% he says

mbzi · on June 15, 2020

I do a significant amount of work with Hospitals and secure environments (Military, etc). TURN is needed 100% of the time. P2P traffic is not allowed. All IP addresses need to be known and kept static upfront for firewall whitelisting.

This means products which help alleviate WebRTC infrastructure such as AWS Kinesis are not allowed (due to how they allocate turn servers with unknown IP addresses) and a company needs to manage their own infrastructure / TURN servers (which allows you to cherry pick where server locations are (HIPAA, country legal for what is streamed)) or accept Twillio's, or their competitors etc, large IP ranges (and don't have server location flexibility / increased commercial and market growth restrictions).

Whichever route you go down it is quite an undertaking!

P.s. Tsahi Levent-Levi is truly exceptional in this area. I highly recommend reading his blog and training courses: https://bloggeek.me/, https://webrtccourse.com/, AND he runs an amazing testing product https://www.testrtc.com. if you build your own infrastructure testRTC is a must.

Orphis · on June 15, 2020

If you can't do P2P, you can have an SFU forwarding the call with more features than a "dumb" TURN relay.

And it would just be STUN between each participant and the SFU deployed in the internal network for example.

mbzi · on June 16, 2020

Nice suggestion, we do this for certain deployments. Amazing what magic you can do in Janus, etc, :)

EGreg · on June 15, 2020

Why even need STUN, the SFU can join as a WebRTC participant, right? Node.js maybe

zumachase · on June 15, 2020

STUN is only useful if you're trying to negotiate a P2P connection, which isn't the case when using an SFU. If everything you're doing is going through an SFU then you don't need STUN.

Sean-Der · on June 15, 2020

I think the most frustrating part is that people don't know what NAT they are behind. I wish WebRTC easily told people the attributes of their NAT. Not sure if it would help production deploys, but would make learning a lot easier.

Someone contributed a really cool tool to Pion stun-nat-behavior[0] that I use a lot. It prints out NAT details. It uses the modern/correct details. I see a lot of docs that still use symmetric NAT etc.. RFC 4787 [1] suggests against all that.

[0] https://github.com/pion/stun/tree/master/cmd/stun-nat-behavi...

[1] https://tools.ietf.org/html/rfc4787#section-3

j1elo · on June 15, 2020

There is also pystun in pip, a comparatively old tool so it is probably much less sophisticated than newer options, but still nice to know about:

https://pypi.org/project/pystun/

kodablah · on June 15, 2020

After hand-rolling my own setups and working with a few libraries, I have found https://mediasoup.org/ v3 to be the easiest library to use that still gives me the freedom to work with the architecture I want. This of course you're not using WebRTC for its p2p capabilities and are willing to scale via SFUs which is a common approach these days.

dougwbrunton · on June 15, 2020

mediasoup has a remarkably well considered API and architecture.

mothepro · on June 15, 2020

Not necessarily, TURN servers should only be required for strict NAT. It seems that 92%* for connections are not behind strict NATs.

*https://developers.google.com/talk/libjingle/important_conce...

bkanber · on June 15, 2020

STUN fails under symmetric NAT, not strict NAT. That google document makes no citations of that 92% figure, but I assume that's for desktop traffic only. Pretty much all mobile/cellular connections would require TURN too.

cma · on June 15, 2020

I haven't found Verizon or AT&T to need TURN, but maybe clients on them would if connecting to each other.

FR10 · on June 15, 2020

In my tests with 10 people 5 needed TURN, ISPs in my country are really something special.

snemvalts · on June 15, 2020

for proof of concepts, twilio video was really nice, everything webrtc is abstracted away. built a demo with it in 2 weeks for a potential pivot.

telesilla · on June 15, 2020

Xirsys and Twilio offer affordable plug-and-play TURN/STUN options, I can recommend them both.

chrisco255 · on June 15, 2020

True but there are some open source TURN servers you can use for that.

cool-RR · on June 15, 2020

And a wrong usage of "touch and go" to boot.

zumachase · on June 15, 2020

The issue with webrtc is once you step out of the side-project domain, you have to confront the endless implementation differences between browsers, whether it's undocumented SDP behavior, different codecs, non-conformant behavior for low level calls, etc.

We've built a push-to-talk walkie talkie system called Squawk[0] which holds long lived webrtc connections in the background throughout the day. We use simplepeer[1] as the base to help bootstrap some of the browser shimming, but it's not perfect. So ultimately we've had to build all sorts of checks into our protocols like an audio keepalive where we send periodic frames (20ms) of silence down the media channel, and verify that we received some additional header bytes on the remote end, because otherwise webrtc would let the connections rot and you wouldn't know until you needed them which in a push-to-talk situation is too late.

[0] https://www.squawk.to

[1] https://github.com/feross/simple-peer

gurjeet · on June 16, 2020

I loved the idea and interface etc. so I registered and downloaded the app. But upon launching, my corporate (ge.com) proxy blocked it:

    Not allowed to browse Newly Registered Domains category
    URL: https://app.squawk.to/

This is perhaps because your whois records are bare minimum!

    $ whois squawk.to
    Tonic whoisd V1.1
    squawk noah.ns.cloudflare.com
    squawk kim.ns.cloudflare.com

You might want to look into that.

Also, to increase adoption, perhaps give the users a link to directly use the app.squawk.to URL to use as well. I tried that on my personal device and it works as advertised.

elk235 · on June 15, 2020

I am really interested in this and would love to read a technical blog post on how you built Squawk. Was it in Electron?

Especially, how do you maintain all connections so users can listen to all channels at once.

hardwaresofton · on June 15, 2020

For those interested in seeing how WebRTC can really scale check out some of the media servers/SFU (Selective Forwarding Units)s that are out there:

- janus[0][1]

- mediasoup[2][3]

- Medoze[4][5]

It's never been easier to start your own video streaming platform.

[0]: https://janus.conf.meetecho.com/

[1]: https://www.youtube.com/watch?v=zxRwELmyWU0

[2]: https://mediasoup.org/

[3]: https://www.youtube.com/watch?v=_GhdFOZTWTw

[4]: https://github.com/medooze/media-server

[5]: https://www.youtube.com/watch?v=u8ymYTdA0ko

pthatcherg · on June 16, 2020

I recently read through a the code bases of a bunch of these, and I would add several to the list:

https://github.com/jitsi

https://github.com/pion/ion

https://github.com/peer-calls/peer-calls

https://www.irif.fr/~jch/software/sfu/

Out of those, it seems Jitsi and Mediasoup to be better as SFUs than the rest because they have what appears to be decent-looking congestion control, bitrate allocation, and support for simulcast. The rest apparently do not (at least I couldn't find anywhere in the code where it happened).

Also interesting to note is that so many of the newer ones are written in Go based on Pion. If Pion ever gains the ability to do decent congestion control (perhaps based on transport-cc like Jitsi and Mediasoup do), that could improve things for all of those.

Sean-Der · on June 15, 2020

https://stackoverflow.com/a/24879451/5472819 is a good list also, quite a few out there! Also quite a few 'WebRTC as a service' providers.

I really think it has gotten way easier in the past couple years. I am biased though, don't want to think my effort has all gone to waste :p

deskamess · on June 15, 2020

Any solutions where WebRTC can be transcoded to an Mpeg2 transport stream?

hardwaresofton · on June 15, 2020

What you want are transcoding features -- some of the servers offer it (ex. Kurento[0] for example which is not listed above), but some don't (ex. mediasoup[1]) and some offer recording but you need to wrangle formats yourself (ex. janus[2]).

[0]: https://www.kurento.org/tags/transcoding [1]: https://mediasoup.org/faq/#does-mediasoup-transcode [2]: https://janus.conf.meetecho.com/recordplaytest.html

deskamess · on June 15, 2020

Thank you! Will take a look at Kurento.

davidsawyer · on June 15, 2020

A 19-year-old[0] built a video call app[1] with WebRTC and open-sourced[2] it.

Here's a podcast interview[3] about how he did it.

[0]: https://github.com/ianramzy

[1]: https://zipcall.io

[2]: https://github.com/ianramzy/decentralized-video-chat

[3]: https://syntax.fm/show/256/webrtc-and-peer-to-peer-video-cal...

fictorial · on June 15, 2020

This is an iteration on Twilio Video's example:

https://www.twilio.com/blog/2014/12/set-phasers-to-stunturn-...

https://github.com/philnash/video-chat/tree/intro

jpgvm · on June 15, 2020

IPv6 is becoming increasingly common here in Asia (I'm based in Thailand but travel a lot around Asia). For context I have used 3 different ISPs here and all have dual stack, both my cellular connections have also been dual stack.

This has meant NAT is less of an issue for native IPv6 endpoints, including P2P.

Hopefully when IPv6 is finally widespread in US/Europe we will see stuff taking more advantage of this fact.

Orphis · on June 15, 2020

It's not just about NAT but firewalls too. I have native IPv6 but you can't reach my computer directly from anywhere.

flyGuyOnTheSly · on June 15, 2020

I'm eager to create a higher quality video broadcasting (not web meeting, one way only) app for some local yoga studios I help out with and am hoping this article gives me a push in the right direction.

The audio quality on zoom is just terrible no matter if you disable DSP or not.

So many yoga classes require high quality music.

It's frustrating that chaturbate provides top notch video and audio quality for free essentially, while paying $20/mo for zoom gives you what looks like 380p video quality and audio quality I have yet to find a poor comparison for...

Does anyone know how one could emulate what chaturbate does?

Any good articles outlining how they do what they do?

Ideally, the teacher would just plop their phone down in front of them, hit broadcast, and a few seconds of buffering later 1080p video and quality audio would be visible through a browser.

Why is that so tough to do??? I haven't been able to find a single article that simplifies or distills it at all.

kabes · on June 15, 2020

From a technological perspective, streaming with a couple of seconds delay is a world of difference from streaming with sub-second latency. You can account for network dynamics with more buffers, encode in higher quality, even using multiple passes, transcode to multiple targets etc.

GuiA · on June 15, 2020

“Chaturbate is the best video chatting product” is a catchy line. Has anyone done a comparison post?

grayfaced · on June 15, 2020

Zoom getting the music audio through mic sounds like the real problem. You should be aiming to stream the audio from digital source. Then you could have the song titles overlaid on video. There's definitely licensing issues though. The instructors are probably already not using legit licenses for their classes though.

Also a lot of audio codecs are tuned towards speech and filter out high frequencies. You should pick one meant for music.

flyGuyOnTheSly · on June 17, 2020

We thought about suggesting spotify playlists while zoom classes are live... and zoom manages to screw with other app's audio sources as well somehow.

Give it a try.

Start playing any form of music through your phone.... mp3... youtube... spotify app... and then open a zoom meeting.

It distorts the sound of the music so much and I can't figure out why they would do that or what purpose it serves.

Probably just poor coding.

tudorpavel · on June 15, 2020

Most of the steaming platforms use HTTP Live Streaming (HLS) because it avoids all the networking NAT headaches that come with p2p connections and it handles variable quality better because each client fetches the best quality for their bandwidth. As far as I know, with WebRTC the sender degrades quality to satisfy the slowest peer.

That said, the downsides of HLS are potentially higher infrastructure costs required to transcode video to the different qualities and somewhat related is the higher latency to live. With proper tweaking you might get 2-3 seconds of latency, but it might be too much for your use case.

If there is voice interaction between the yoga instructor and their students the HLS delay will certainly be noticeable.

Orphis · on June 15, 2020

> WebRTC the sender degrades quality to satisfy the slowest peer

No, WebRTC is 1 to 1. Each connection is adapted independently. But, you can build services that have rooms with more participants, then it's up to you to shape the traffic as you want. If you use a central server (SFU), it can just send each peer the best they can receive, each independently from one another. It's a property of the service, not the technology.

hckr_news · on June 15, 2020

Any luck with Jitsi meet ? https://github.com/jitsi/jitsi-meet

flyGuyOnTheSly · on June 17, 2020

I looked into that but it's not ideal mainly because we don't need a video conferencing system.

I suppose I could dig through the code and disable anyone but the host's video feeds but I don't have a lot of time to dedicated to this project unfortunately.

I just try to help them out when I can.

jval43 · on June 15, 2020

Isn't your usecase just a one-to-many stream, like e.g. Youtube, Twitch, Periscope, etc. all provide? That should be easier than a proper n-way meeting.

flyGuyOnTheSly · on June 17, 2020

They all do that, but showing youtube advertisements to paying clients isn't ideal.

Also youtube makes it difficult to restrict who can and cannot see the video.

Without a pay wall there is little incentive to make payment.

Some studios have done OK with a donation/pay what you can model but those donations and rapidly dwindling now in the third month of Covid.

bredren · on June 15, 2020

While the video streaming isn’t that great on zoom, my experience is the audio quality piped through their custom audio thing for maxis is pretty good.

mkl · on June 15, 2020

I think Zoom's audio filtering (which in my experience is pretty good) is designed only for speech, and so other sounds like music can get messed up.

bredren · on June 15, 2020

Have you tried the share using the internally piped audio option? The quality Is good.

markn951 · on June 15, 2020

It is a truism that porn always pushes the Internet’s state of the art forward.

mirekrusin · on June 15, 2020

One way streaming can be prerecorded and shared as video.

talkingtab · on June 15, 2020

I've been looking into webrtc and used the "webrtc samples" which are good in many ways. It is fairly easy to get something up and running, but I found several areas that were difficult.

* debugging. One users sound just doesn't work while it works perfectly for me with different machines. I am clueless as to how to debug it.

* ice. while it works, I had a hard time understanding, tracking and debugging what was going on.

* closing and restarting connections

* multiple clients in one room?

* echo cancellation. This was frustrating for users.

* Turn. Is there a tool or way to know which clients need a turn server? Are using a turn server?

I ended up guessing that getting it to be a product would actually be fairly time consuming

bkanber · on June 15, 2020

WebRTC doesn't do everything for you; it's really just responsible for tying together ICE with media streams. Signaling is up to you to figure out. For instance, multiple clients in one room: this is part of the signaling layer and is not WebRTC's responsibility (I built this into zonko.chat if you want to see how it works though).

Closing and restarting connections is signaling layer stuff, ie your responsibility.

Echo cancellation is really supposed to be application layer and up to you as well, but I think this will probably shift to be the browser's/WebRTC's/getUserMedia's responsibility at some point.

Re. TURN: ICE is the process that works out whether a specific client needs to relay through a TURN server. The question is: do you need to implement a TURN server? The answer is: yes, you need a TURN server. If you built a P2P app that you want to work for all users, you will always need a TURN server. You can run coturn on the same box that you serve your app from. Most likely a side project will never hit the scale requiring more than a $5 digitalocean box for TURN.

And yes, it should not be a surprise that products are time consuming to build :) WebRTC is plumbing; you probably were expecting something more like Jitsi.

mebeam · on June 15, 2020

FYI, echo cancel actually does work ( chrome definitely ), just make sure you specify the audio constraint so that it has a sample rate of 16khz ( aec does not work on the default 44/48khz modes )

bkanber · on June 15, 2020

Good suggestion, I will have to try that out!

algesten · on June 15, 2020

> Echo cancellation is really supposed to be application layer and up to you as well, but I think this will probably shift to be the browser's/WebRTC's/getUserMedia's responsibility at some point.

Echo cancellation typically can't be application layer. The APIs I've seen (Android, iOS, WebRTC), require low level latency and works best as close to hardware as possible.

{ echoCancellation: true } as a track constraint in getUserMedia works.

bkanber · on June 15, 2020

I've never actually gotten { echoCancellation: true } to work for me, but your sibling comment does have a suggestion I need to try out!

Echo cancellation is a pretty lightweight DSP/FIR task. Whether you do it close to the hardware (? I suspect this is not actually the case with getUserMedia though -- it is still an audio stream algorithm) or in the application layer, echo cancellation requires the same amount of added latency.

But in any case, I did say I suspected echo cancellation would shift to getUserMedia. It's not fully there yet, but it will be.

mypalmike · on June 15, 2020

It depends what the product is. If you're trying to build another Zoom (which I gather from the "rooms" question), yes, it will take quite some time. For one thing, the mesh topology of P2P won't scale up beyond a handful of users, so you'll need to make it client/server. And besides time-consuming, that starts to get operationally expensive. Decoding, compositing, and encoding high-resolution video streams in real time take some processing power.

j1elo · on June 15, 2020

If you want to try a platform that abstracts some parts of it (such as signaling) and aims to provide an all-in-one package (compared with WebRTC which is a collection of puzzle pieces that you are responsible to put together), have a look at OpenVidu.

The team behind Kurento is working on this (I am part of it) for people who don't really care about all the intricacies of the standard(s), and just want to build a product on too of it. A single Docker container to deploy, and you're all set to write your app.

Still, this is a complex topic so there are a thousand ways this technology can be made easier to use and understand. And I agree with other comments about the issue of debugging, there is totally an empty space in the market for a comprehensive solution that can help troubleshooting when WebRTC fails.

FR10 · on June 15, 2020

The debugging bit is so frustrating, I almost spend a whole week trying to find a bug with my PeerConnections only to find out that it was that the TURN Server was misconfigured (although Trickle ICE was successful). And even then just setting up a TURN server consumed a whole day.

holtwick · on June 15, 2020

https://brie.fi/ng - a modern pure open source WebRTC implementation. It can even blur peoples background for visual privacy. Sources at https://github.com/holtwick/briefing/

holtwick · on June 15, 2020

I added an own entry here https://news.ycombinator.com/item?id=23523830

Learn more about the details: https://brie.fi/ng#help Also see https://webrtc-security.github.io/

WebRTC is end-to-end encrypted by default. There is a signaling server that helps establishing the connections between users in a room, but after that the communication is encrypted. Also those TURN and STUN servers are only required for technical reasons to get peer-to-peer working. So no content is ever passed unencrypted.

That's the difference to other services like Zoom and Jitsi, where a server in the middle is receiving the video streams unencrypted and then redistributes. Although Jitsi is adding encryption support for that as well soon.

xueyongg · on June 15, 2020

Took some time over the weeks to play and figure out WebRTC. Made a simple app out of it. Do check it out!

EGreg · on June 15, 2020

We started a simple webrtc app in 2018. Thought it would be simple. Now two years later we are still tweaking the code and dealing with handshakes and codecs across browsers, as well as edge cases involving firewalls and what to do if someone disconnects for longer than the timeout.

Finally we had to invent workarounds for Cordova: https://mobile.twitter.com/qbixapps/status/11564841564250398...

Did anyone set up a WebRTC that was super easy and worked rock solid with just a few lines?

Mulpze15 · on June 15, 2020

Aren't those codec exchanges and handshakes managed by ICE? Why would you need to go into this layer yourself?

fictorial · on June 15, 2020

One example – H.264 is hardware accelerated on iPhone so one might prefer this over VP8 which could drain the device's battery pretty quickly when used in a P2P mesh setup.

gligorot · on June 15, 2020

First off, I'm a total noob when it comes to WebRTC, but having read the docs Google provide for it (and the accompanying mini video app tutorial) it seemed like it's dead simple to implement and use - I understood that the complexities and basically everything you talked about above was already handled. Is your implementation different than theirs or did I maybe misunderstand the value propositions of WebRTC?

EGreg · on June 15, 2020

See the sister comment to yours

tim-- · on June 15, 2020

Twillio has this, but it is crazy difficult to do DIY something in the real world.

EGreg · on June 15, 2020

Yes. Twilio and maybe Jitsi.

We open sourced our implementation.

I am still wondering if we missed some simple solution.

sandGorgon · on June 15, 2020

curious to know - how do you scale webrtc ?

I mean do you use kubernetes and stuff - what load balancer do you use (I assume sessions are "sticky"), etc

ethanwillis · on June 15, 2020

Scaling doesn't always involve kubernetes.

But the way you scale WebRTC is by using multiplexers between clients.

Let's say you have 2 clients. They can directly send data to each other.

Let's say you have 3 clients. Client 1 sends data to Clients 2, 3 Client 2 sends data to Clients 1, 3 Client 3 sends data to Clients 1, 2

As you add more clients the scaling requires n(n-1)/2 edges because it forms a Complete Graph.

The idea then is to use a server in the middle. That server could be 3rd party or maybe even one of the participants within the graph.

sandGorgon · on June 15, 2020

sure - but ultimately you will have to scale the server horizontally. or do you mean that one server can scale till millions of users

ethanwillis · on June 15, 2020

If you have millions of users sure use kubernetes. If you don't have millions of users just setup a handful of decently sized VMs.

shyamady · on June 15, 2020

Twilio costs but not a bad idea. I created Remotehour(https://remotehour.com) which allows you to have an 'open-door' policy video call easily. It works with Twilo :)

whoatethedonut · on June 15, 2020

This reminds me of Icecomm[0] from a few years back. Unfortunately, it didn't stick around for too long. It was pretty easy to use, as well, and a lot of people here ended up in a video chat together[1]. LOL!

[0]: https://news.ycombinator.com/item?id=8952880

[1]: https://medium.com/@icecomm/how-launching-icecomm-on-hacker-...

panpanna · on June 15, 2020

Does anyone have a better tutorial on webrtc?

I didn't find the article particularly good.

mcjiggerlog · on June 15, 2020

I have some experience in this from developing https://p2p.chat a while back.

As others have mentioned, building a simple project is fairly simple. The difficulty comes when you want to scale to more than ~4 users without the app becoming unusable. Adjusting audio/video constraints to ensure that you get optimal media streams is quite difficult, also. Nevermind dynamically tweaking them!

rergaerg · on June 15, 2020

In real life, the STUN server rarely works, and thus, the myth of this peer to peer utopia was never realised, and why webrtc did not receive any attention

ryanrolds · on June 15, 2020

A small group of friends and I are working on a virtual karaoke club using WebRTC and Go, https://github.com/ryanrolds/club. 100% agree with the WebRTC being easy to create proof-of-concepts, but there are a lot of edge cases and browsers differences that have to be worked through.

xueyongg · on June 15, 2020

What are your plans to deal with the scalability issue thus far? I think that is always the biggest challenge.

ryanrolds · on June 15, 2020

The current plan is to keep everyone in "groups", think friends at a table, small (max 6). The server will maintain peer connections with everyone in the "room" and broadcast the singer via that peering. As the singer changes, the server will simply allow the KJ to pick who is getting broadcast over the other server -> client peer.

xueyongg · on June 15, 2020

I really learnt so so much from the entire thread discussion here today! WebRTC I've gathered is oftentimes easy to create, but the real challenge is in the scalability. From the way it seems, scalability is only possible with the forwarding architecture with the use of the Selective Forwarding Unit or of like.

I always wonder if there is a way to think outside of this 'box'.

cf · on June 15, 2020

One thing I've wanted to make is something like gather.town where I can remix the audio and video so that different users sounded louder or quieter. But, I never figured out where in the WebRTC API that is done. It seems like I need to set up my own SFU and put the necessary logic over there.

Sean-Der · on June 15, 2020

Are you trying to adjust the volume of the remote audio streams? Can you change value of the `audio` element of the DOM?

I think you could also do it with the WebAudio API. If you throw up a repo would love to try and help :) having a backend makes it so much harder to deploy/maintain stuff.

cf · on June 15, 2020

Yes I want to adjust remote audio streams but I don't see where in the API I can iterate over each user's audio feed.

moron4hire · on June 15, 2020

It's not in the WebRTC API. You can change the volume on the audio tags you create, or you can pipe the audio media stream into a WebAudio graph and modify it there.

I've built a number of WebRTC apps over the years. Recently, I built just such a thing as you described and open sourced it: https://www.calla.chat. I opted to build it on top of Jitsi Meet this time. It's actually advantageous that it's not through the WebRTC API because Jitsi doesn't give access to the raw WebRTC commands. But hijacking the audio elements it creates is completely doable.

cf · on June 15, 2020

Oh man this is like 90% of what I wanted to make! Thanks for making this open-source!

ronlobo · on June 15, 2020

Pretty cool!

Check out

https://github.com/meething/meething

dWebRTC Video Meetings MESH/SFU hybrid using GunDB, MediaSoup and Beyond!

This seems to be one of the most promising projects in the WebRTC space with support from Mozilla Builders.

buboard · on June 15, 2020

sadly webrtc p2p does not scale. Sadly current HTML-based solutions for media servers are slow and highly CPU intensive. Even more sadly, adobe flash has solved the problem of multi video chat decades ago, but we have decided to deprecate without alternative

julius_set · on June 15, 2020

This article is a strange one. They mention WhatsApp and some other mobile products but then proceed to frame everything into the context of a browser.

WebRTC works without a browser too FYI

kbumsik · on June 15, 2020

Can it be used as an UDP alternative for server-to-client communication, not browser-to-browser? If so, are there any projects implementing it?

gandutraveler · on June 15, 2020

Does webrtc also work on native Android iOS apps ?

Orphis · on June 15, 2020

There are WebRTC SDKs for Android and iOS.

remotists · on June 15, 2020

Most ISPs here in Asia have dual stacks.

spicyramen · on June 15, 2020

It's interesting the raise again for unified communications. All this technology specifically WebRTC is being around for few years now. The Innovation is minimum, why? Most do the problems are solved. When a technology is mature most of the focus is on security or applying other technologies to improve it such as Machine Learning. In the case of VoIP and Video apps, is very mature since the inception of H323, SIP, SCCP, RTP, sRTP, most recently JS and WebRTC.