What is WebRTC?

WebRTC, short for Web Real Time Communications, is a specification and project adding JavaScript APIs in the browser to:

1. Access a user’s webcam and microphone: getUserMedia.
2. Connect directly to another browser: PeerConnection and DataChannel.

The main use case is video calling; Google wants to build Talk and Hangouts in JavaScript. The spec provides many more example use cases.

If you have Google Chrome 21+ (latest stable) or Opera 12+, you can try it it now:

Play Magic Xylophone: http://www.soundstep.com/blog/experiments/jsdetection/
Full video conference: https://apprtc.appspot.com (Chrome only. First go to about:flags and Enable PeerConnection).

Is it ready?

No, but don’t let that stop you. The specification is still evolving, and vendor prefixes are the rule (except in Opera). Right now, you can get video from a webcam reliably in Chrome and Opera.

Web applications have a new input device. Until now a webapp could only read from the keyboard and the mouse (and files dropped on to the app). To those, we can now add the webcam as an input device. Update your profile photo, snap that item you’re selling, or pair it with the File Writer API for a basic digital camera. Or play xylophone in the air (See the Magic Xylophone above).

The two demos should give you a feel for how stable and ready WebRTC implementations are. Capturing images from a users’ webcam works well. Full video conferencing was very unreliable in my tests, mostly I think due to NAT traversal problems.

Support

getUserMedia:

Chrome: In stable release.
Opera: In stable release.
Firefox: In nightlies, expected by end of year.
IE: In Chrome Frame. Microsoft announced it native for IE11, but now opposes WebRTC’s inclusion as a W3C standard, supporting instead it’s own CU-RTC-Web.
Safari: No news.

PeerConnection:

Chrome: In stable release, behind flag.
Opera: No news.
Firefox: Expected by end of year.
IE: Already in Chrome Frame. See above for native.
Safari: No news.

getUserMedia

Video

Video is the most advanced part of WebRTC at the moment. There are lots of demos.

Here’s how you’d take a snapshot with your webcam. It’s the Hello World of getUserMedia. Try the full example / view source.

First you need some containers:

<video autoplay></video>
<button id="snap">Take picture</button>
<canvas id="snapshot"></canvas>

Then we de-prefix:

window.URL = window.URL || window.webkitURL;
navigator.getUserMedia  = navigator.getUserMedia || navigator.webkitGetUserMedia || navigator.mozGetUserMedia;

Ask for access to the webcam. You can request {video: true} or {audio: true} , or both. The browser will ask the user for permission, and optionally to select which webcam.

navigator.getUserMedia({video: true}, onSuccess, onFailure);

Next, connect the stream to our video tag. The browser passes video from the webcam to your JavaScript function as a MediaStream. That MediaStream can be plugged into a video tag, or sent to a remote browser via a PeerConnection. A MediaStream represents audio and video streams as Blob-URI’s, the same way the File API represents files. The img tag can read Blob URI’s that have an image file type. The video tag can read Blob URI’s that have a video file type. After connecting, our video tag would look something like this: <video src="blob:550e8400-e29b-41d4-a716-446655440000" />.

function onSuccess(stream) {
    var video = document.querySelector("video");
    video.src = window.URL.createObjectURL(stream);
    // video.src = stream // Swap this line for above if on FF. Mozilla proposed simplification.
}

Once you have the stream in a video element, you can copy it to a canvas, use CSS on it, and do all the things you can do with a video element. See HTML5 video at html5rocks. From a canvas you can get a base64 data URL, and either plug that data URL into an img.src, or upload the base64 image data to the server. Here, we copy the image to a canvas (to take a snapshot), and print the base64 data URL on the console.

function snap() {
    var snapshot = document.getElementById("snapshot"),
         video = document.querySelector("video");
    snapshot.getContext("2d").drawImage(video, 0, 0, snapshot.width, snapshot.height);
    console.log(snapshot.toDataURL("image/png"));
}
document.getElementById("snap").onclick = snap;

The specification doesn’t yet require a specific video codec. The de-facto standard, supported by both Chrome and Firefox, is VP8 (patent grant from Google).

Audio

The WebRTC project offers a complete stack for voice communications. It includes not only the necessary codecs, but other components crucial for a great experience. This includes software based acoustic echo cancellation (AEC), automatic gain control (AGC), noise reduction, noise suppression and hardware access and control across multiple platforms.

Audio from a users microphone cannot be recorded yet, or piped to an audio tag, so the only thing you can do with it is connect it to a PeerConnection, which isn’t quite ready yet. Hence the lack of interesting microphone audio demos.

Audio codecs required by the WebRTC Codec draft are:

G.711, a standard telecoms codec from 1978.
Opus, a brand new BSD licensed high-quality high-speed adaptive codec.

Firefox supports both of those (from Mozilla’s wiki).

Google, citing licensing issues, does not yet support Opus but is a strong supporter of it. Currently Chrome supports (from webrtc.org FAQ):

iSAC, BSD licensed and the default codec in Google Talk.
iLBC, BSD licensed.
G.711 and G.722 (another telecoms standard).

PeerConnection

A PeerConnection allows two users to communicate directly, browser to browser. You get your webcam’s stream via getUserMedia, send it over a PeerConnection, and the receiver puts it into a video DOM element. Instant video conferencing!

Most browsers do not have a public IP address, so PeerConnection has do to NAT traversal, for which it uses ICE.
The Interactive Connectivity Establishment (ICE) protocol provides a structured mechanism to determine the optimal communication path between two peers. It will use STUN or TURN as needed, trying several paths.

ICE needs the two browsers to talk to each other to setup the direct connection (they exchange “ICE candidates”, which are address / port pairs). Typically they will use the web server they got their current page from, accessed for example via WebSocket. The WebRTC specification calls this the signalling channel, and does not require that it take any particular form. You could go entirely server-less, and read out the ICE candidates to each other over the phone.

Here’s a snippet of how Chrome initiates PeerConnection setup now:

try {
    pc = new webkitPeerConnection00("STUN stun.l.google.com:19302", onIceCandidate);
} catch (e) {
    alert("Cannot create PeerConnection object; Is the 'PeerConnection' flag enabled in about:flags?");
}

pc.onconnecting = onSessionConnecting;
pc.onopen = onSessionOpened;
pc.onaddstream = onRemoteStreamAdded;
pc.onremovestream = onRemoteStreamRemoved;

DataChannel

PeerConnection can only send MediaStream types – it was specifically designed for video conferencing. Direct browser-browser connection is much more widely useful than that, so WebRTC adds a DataChannel object to transfer generic data; Blob, ArrayBuffer, or String. The spec is still rough here. A DataChannel might feel like a WebSocket (same API), but connect direct to another browser.

This is a very exciting part of the spec. It would allow building peer-to-peer networks purely in the browser.

The Chrome team hopes to have DataChannel by end of the year.

Putting it all together: Video chat

apprtc.appspot.com goes through these steps to start a video call:

1. openChannel: Opens the signalling channel; in this case XMLHttpRequest and long-polling (Using Google App Engine’s ‘channel’ API)
2. getUserMedia: Request access to webcam for video and audio.
3. Add local video stream to video HTML tag.
4. createPeerConnection: Create a PeerConnection.
5. Add local video MediaStream to PeerConnection.
6. Connect to remote. This is the Session Description Phase (abbreviated SDP in the spec). NAT traversal is negotiated here. Local extends an ‘offer’ to remote, which sends an ‘answer’.
7. Receive remote video stream, add it to a video HTML tag. The users are now video conferencing.

Conclusion

Expect to see a lot more coverage of WebRTC near the end of the year. Chrome and Firefox are hoping to land the bulk of it by then (see Chrome status and Firefox announcement).

WebRTC means several things:

We can start using webcams in our apps. No more hunting for your digital camera when you’re trying to list something for sale, just hold it in front of your webcam and press the button.
The browser will be an audio / video conferencing tool, so goodbye desktop clients (Skype).
DataChannel is a firm swing of the pendulum back towards thick client. Server-less webapps will be possible.
The web platform is feeling increasingly like Lego and like, well, a platform. You plug a MediaStream into a PeerConnection or into a video tag. The media stream could be from a webcam, your screen (for screen sharing – mentioned here), or pre-recorded. The contents of video tag can be plugged into canvas, and everything can be styled with CSS.

Resources

Spec: http://www.w3.org/TR/webrtc/
WebRTC in Chrome: http://www.webrtc.org/
Justin Uberti’s Google I/O talk
getUserMedia intro at HTML5rocks: http://www.html5rocks.com/en/tutorials/getusermedia/intro/