WebRTC: The Best Open-Source Technology for Live Streaming

Learn how EiTV uses WebRTC open-source technology to broadcast audio and video in real time without CDN costs.

Rodrigo Cascão Araújo

CEO and Founder – EiTV

Rodrigo Cascão de Araújo, PhD in Computer Engineering and an expert in streaming and digital TV technologies. Founder of EiTV, a leading and pioneer company in the field of digital television operating in Brazil and Latin America. Founder and advisor of the Brazilian Forum of Digital Television. Renowed specialist in technological innovation from Babson College and Stanford University.

A dedicated family man in love with my wife and two daughters. As a hobby he always liked sports and currently I’ve been practicing beach tennis and biking. Another seasonal hobby of his fishing whislt also black belt in karate.

November 18, 2022

WebRTC technology was created in 1999 by Global IP Solutions (GIPS) in Sweden. GIPS had already developed many of the codecs and echo cancellation techniques that underpin real-time communication. When Google acquired this VoIP (voice over IP) and videoconferencing company in 2010, it turned these real-time communication components into a browser-based open-source project and pushed it towards standardization by the W3C (World Wide Web Consortium)[1].

WebRTC (Web Real-Time Communication) is currently a collection of protocols, standards, and JavaScript APIs aimed at enabling peer-to-peer communication on the Web. WebRTC can use a wide variety of protocols to achieve ultra-low latency streaming, including signaling, STUN, TURN, and ICE protocols. While it also acts as a streaming protocol, it can work in conjunction with other streaming protocols including Real-Time Messaging Protocol (RTMP), Real-Time Streaming Protocol (RTSP) and WebRTC HTTP Ingestion Protocol (WHIP).

Communication Components via WebRTC Technology

There are several components involved to establish communication via WebRTC technology. We can say that developing a solution based on WebRTC requires knowledge of an extensive and difficult to understand theory.

Next, we will present each of the necessary components to be able to develop an audio and video connection via WebRTC between devices.

Signaling and SDP Protocol

Signaling refers to how to configure and control a communication session between two or more agents. Agents that connect to a live communication send their stream to the Signaling Server. The server encodes this stream and delivers it to the receiving end(s).

This communication can be bidirectional. The source encodes the stream and sends it at a suitable resolution to another point. Signaling uses the SDP (Session Description Protocol) protocol, containing various details of communication between agents:

The location of the agents, which is the IP address.
The consumer audio and video tracks that agents can receive.
The audio and video production tracks that agents transfer.
The data channels that determine the media type and resolution level.

Signaling allows agents to exchange metadata for coordinated communication. An application consuming WebRTC technology requires browser support, but internally, browsers communicate through the signals they send to servers. This is the role of SDP protocol. It assists the server in sending and receiving data, making direct communication between agents through the STUN and TURN servers.

STUN Servers

STUN is an acronym for Standard Traversal Utilities for NAT. These servers allow an agent to find your public IP address by making requests to the STUN server. These servers also work with NAT devices that hide agents’ IP addresses.

Network Address Translation (NAT) devices help keep agent IP addresses private for security and IP preservation purposes. Basically, they enable a private IP network through which devices can access the Internet without revealing their true IP address. This prevents hackers from being able to access agents’ IP addresses and eliminates the problem that IP addresses are a limited commodity that are in danger of running out.

These servers strive to map agents and allow sharing video audio streams between mapped agents.

TURN Servers

TURN is an acronym for Traversal Using Relays around NAT. TURN servers help establish connections between agents when a direct connection between them is not possible due to firewall restrictions.

These servers serve to work with the privacy of the agents, not allowing the servers to locate the IP of the communicators. Each TURN server creates a temporary IP for agents to generate round-trip traffic, acting as a proxy.

Connection

Connection refers to ensuring two-way communication between agents. In WebRTC technology, communication takes place via peer-to-peer (P2P) connections, not client-server connections. The connection is equally distributed among the communicating agents through their transport addresses.

In general terms, it can be said that establishing communication can be a difficult task due to different network protocols or transport addresses of agents. These difficulties make configuration difficult but can be resolved by the ICE servers.

ICE Servers

ICE is a protocol that figures out the best possible way to connect agents. Each agent publishes that he can be logged in and is registered as a candidate on the ICE server.

ICE servers ensure the best possible connection between agents registered as candidates, even if their location is difficult to determine. For these difficulties to be solved efficiently they use the STUN and TURN servers.

Security

WebRTC technology requires real-time communication to take place securely. It ensures that all information shared between agents is encrypted and remains confidential. The DTLS and SRTP protocols are used to guarantee this security in WebRTC connections.

DTLS Protocol

The Datagram Transport Layer Security (DTLS) protocol allows WebRTC technology to establish secure and encrypted communication between agents. The client and the server to communicate need to generate encryption keys through a DTLS handshake.

SRTP Protocol

SRTP is an acronym for Secure Real-time Transport Protocol. It protects and encrypts media streams between connected agents. It is initialized using keys generated by DTLS. This protocol is specifically designed to encrypt RTP packets.

RTC Protocol

RTC stands for Real-time Transport Protocol. This protocol is designed to carry real-time video delivery. It provides the agent with streams that can run multiple media feeds from one connection. This protocol does not guarantee that the media transfer has low latency and reliability, but it provides a tool to implement them, the RTCP protocol.

RTCP Protocol

RTCP is an acronym for Real-time Control Protocol. It allows administrators to monitor call quality via collected metadata. This protocol allows an agent to add whatever metadata they want to report call statistics. This protocol also tracks packet loss, latency, and other VoIP issues.

UDP Protocol

The User Datagram Protocol (UDP) is the communication protocol that WebRTC technology uses to stream media. In contrast to TCP (Transmission Control Protocol), UDP prioritizes speed over reliability.

TCP operates with what is called a client/server handshake, making the connection more reliable but slower. The recipient’s device acknowledges receipt of the data packets, so the sender’s device knows if the data packets are being lost. UDP does not have this handshake, which means it is sending data packets with no real guarantee that they are getting where they need to go or even arriving in the correct order. This results in a much faster flow, but with reduced reliability.

Challenges and Benefits of WebRTC Technology

The open source WebRTC technology has some pros and cons. Let’s take a moment to review the benefits and challenges that need to be overcome to help companies achieve their streaming goals.

Benefits of WebRTC Technology

Open-source — The source code for WebRTC is available to anyone who wants to use it. This makes it possible for a community of developers to come together and improve the technology. Access to the source code is necessary when it is necessary to add elements to expand or adapt the workflow of WebRTC based solutions.

Platform independent — As its streaming protocols are browser-based, WebRTC technology does not require any special hardware or other equipment.

Standardized security — In addition to benefiting from browser-specific security, WebRTC technology imposes its own strict security measures, including the use of HTTPS for communication with signaling servers and SRTP for encrypting media files.

Ultra-low latency — This is the main differential of WebRTC technology. It’s the fastest streaming technology available with less than 500 milliseconds of latency.

Low CDN cost – As it uses point-to-point (P2P) communication, WebRTC technology does not require the contracting of a content distribution network (CDN) to deliver communication flows in real time.

WebRTC Technology Challenges

Video quality — WebRTC technology does not inherently include strategies to adapt streams to different end-user bandwidth constraints. That way, it’s harder to ensure someone is viewing the best possible resolution.

Reliability — UDP-based technologies are fast but unreliable. The chances of your stream experiencing interruptions or dropped data packets are greater with WebRTC than with TCP-based protocols such as HTTP Live Streaming (HLS).

Scalability — WebRTC technology is at its best when connecting two peers in a network. It can stream to multiple end-user devices, but it faces significant challenges, especially after 50 users. Without a tailored workflow that implements a dedicated media server, WebRTC technology is not suited for larger streams.

JavaScript APIs

An application programming interface (API) is a set of definitions and protocols (that is, rules) that lead to a specific result, dictating how two software components interact. WebRTC technology uses a set of three Java Script APIs, all with a specific task.

RTCPeerConnection API

This API connects an agent’s device to another agent’s device (the far end) using the successful ICE candidate. It is also responsible for maintaining this connection throughout the flow.

GetUserMedia API

This API lets you capture audio and video data from an agent’s camera and microphone for encoding and streaming.

RTCDataChannel API

This API is specific to non-audio-visual data such as text messages and image sharing.

How EiTV uses WebRTC technology

EiTV implemented two formats for the transmission of live events on the EiTV CLOUD[1] streaming platform. The first format is based on the delivery of audio and video streams using the RTMP (Real Time Messaging Protocol) or RTSP (Real Time Streaming Protocol) protocols and distribution via CDN. This format has a latency of about 20 seconds between capturing and delivering the video to the recipient and distributes the stream using adaptive streaming (HLS or DASH). It is recommended for events with larger audiences, over 20 users.

The second format uses WebRTC technology to transmit events in real time. This format is recommended for live events with smaller audiences, such as virtual meetings, webinars, and training for a maximum of 20 users connected simultaneously. This format also allows bidirectional transmission of audio and video stream. That is, the presenter can view and interact via audio and video with users who are connected to the transmission. Figure 2 shows the interface where a real-time event can be added to the platform.