UVC H264 Encoding cameras support in GStreamer

More and more people are doing video conferencing everyday, and for that to be possible, the video has to be encoded before being sent over the network. The most efficient and most popular codec at this time is H264, and since the UVC (USB Video Class) specification 1.1, there is support for H264 encoding cameras.

One such camera is the Logitech C920. A really great camera which can produce a 1080p H264-encoded stream at 30 fps.  As part of my job for Collabora, I was tasked to add support for such a camera in GStreamer. After months of work, it’s finally done and has now been integrated upstream into gst-plugins-bad 0.10 (port to GST 1.0 pending).

One important aspect here is that when you are capturing a high quality, high resolution H264 stream, you don’t want to be wasting your CPU to decode your own video in order to show a preview window during a video chat, so it was important for me to be able to capture both H264 and raw data from the camera. For this reason, I have decided to create a camerabin2-style source: uvch264_src.

Uvch264_src is a source which implements the GstBaseCameraSrc API. This means that instead of the traditional ‘src’ pad, it provides instead three distinct source pads: vidsrc, imgsrc and vfsrc. The GstBaseCameraSrc API is based heavily on the concept of a “Camera” application for phones. As such, the vidsrc is meant as a source for recording video, imgsrc as a source for taking a single-picture and vfsrc as a source for the viewfinder (preview) window. A ‘mode’ property is used to switch between video-mode and image-mode for capture. The uvch264_src source only supports video mode however, and the imgsrc will never be used.

When the element goes to PLAYING, only the vfsrc will output data, and you have to send the “start-capture” action signal to activate the vidsrc/imgsrc pads, and send the “stop-capture” action signal to stop capturing from the vidsrc/imgsrc pads. Note that vfsrc will be outputting data when in PLAYING, so it must always be linked (link it to fakesink if you don’t need the preview, otherwise you’ll get a not-linked error). If you want to test this on the command line (gst-launch) you can set the ‘auto-start’ property to TRUE and the uvch264_src will automatically start the capture after going to PLAYING.

You can request H264, MJPG, and raw data from the vidsrc, but only MJPG and raw data from the vfsrc. When requesting H264 from the vidsrc, then the max resolution for the vfsrc will be 640×480, which can be served as jpg or as raw (decoded from jpg). So if you don’t want to use any CPU for decoding, you should ask for a raw resolution lower than 432×240 (with the C920) which will directly capture YUV from the camera. Any higher resolution won’t be able to go through the usb’s bandwidth and the preview will have to be captured in mjpg (uvch264_src will take care of that for you).

The source has two types of controls, the static controls which must be set in READY state, and DYNAMIC controls which can be dynamically changed in PLAYING state. The description of each property will specify whether that property is a static or dynamic control, as well as its flags. Here are the supported static controls : initial-bitrate, slice-units, slice-mode, iframe-period, usage-type, entropy, enable-sei, num-reorder-frames, preview-flipped and leaky-bucket-size. The dynamic controls are : rate-control,  fixed-framerate, level-idc, peak-bitrate, average-bitrate, min-iframe-qp,  max-iframe-qp, min-pframe-qp,  max-pframe-qp, min-bframe-qp,  max-bframe-qp, ltr-buffer-size and ltr-encoder-control.

Each control will have a minimum, maximum and default value, and those are specific to each camera and need to be probed when the element is in READY state. For that reason, I have added three element actions to the source in order to probe those settings : get-enum-setting, get-boolean-setting and get-int-setting.
These functions will return TRUE if the property is valid and the information was successfully retrieved, or FALSE if the property is invalid (giving an
invalid name or a boolean property to get_int_setting for example) or if the camera returned an error trying to probe its capabilities.
The prototype of the signals are :

  • gboolean get_enum_setting (GstElement *object, char *property, gint *mask, gint *default);
    Where the mask is a bit field specifying which enums can be set, where the bit being set is (1 << enum_value).
    For example, the ‘slice-mode’ enum can only be ignored (0) or slices/frame (3), so the mask returned would be : 0x09
    That is equivalent to (1 << 0 | 1 << 3) which is :
    (1 << UVC_H264_SLICEMODE_IGNORED) | (1 << UVC_H264_SLICEMODE_SLICEPERFRAME)
  • gboolean get_int_setting (GstElement *object, char *property, gint *min, gint *def, gint *max);
    This one gives the minimum, maximum and default values for a property. If a property has min and max to the same value, then the property cannot be changed, for example the C920 has num-reorder-frames setting: min=0, def=0 and max=0, and it means the C920 doesn’t support reorder frames.
  • gboolean get_boolean_setting (GstElement *object, char *property, gboolean *changeable, gboolean *default_value);
    The boolean value will have changeable=TRUE only if changing the property will have an effect on the encoder, for example, the C920 does not support the preview-flipped property, so that one would have changeable=FALSE (and default value is FALSE in this case), but it does support the enable-sei property so that one would have changeable=TRUE (with a default value of FALSE).

This should give you all the information you need to know which settings are available on the hardware, and from there, be able to control the properties
that are available.

Since these are element actions, they are called this way :

gboolean return_value, changeable, default_bool;
gint mask, minimum, maximum, default_int, default_enum;

g_signal_emit_by_name (G_OBJECT(element), "get-enum-setting", "slice-mode", &mask, &default_enum, &return_value, NULL);
g_signal_emit_by_name (G_OBJECT(element), "get-boolean-setting", "enable-sei", &changeable, &default_bool, &return_value, NULL);
g_signal_emit_by_name (G_OBJECT(element), "get-int-setting", "iframe-period", &minimum, &default_int &maximum, &return_value, NULL);

Apart from that, the source also supports the GstForceKeyUnit video event for dynamically requesting keyframes, as well as custom-upstream events to control LTR (Long-Term Reference frames), bitrate, QP, rate-control and level-idc, through, respectively, the uvc-h264-ltr-picture-control, uvc-h264-bitrate-control, uvc-h264-qp-control, uvc-h264-rate-control and uvc-h264-level-idc custom upstream events (read the code for more information!). The source also supports receiving the ‘renegotiate’ custom upstream event which will make it renegotiate the according to the caps on its pads. This is useful if you want to enable/disable h264 streaming or switch resolution/framerate easily while the pipeline is running; Simply change your caps and send the renegotiate event.

I have written a GUI test application which you can use to test the camera and the source’s various features. It can also serve as a reference implementation on how the source can be used. The test application resides in gst-plugins-bad, under tests/examples/uvch264/ (make sure to run it from its source directory though).

uvch264_src test application (click to enlarge)

You can also use this example gst-launch pipeline for testing the capture of the camera. This will capture a small preview window as well as an h264 stream in 1080p that will be decoded locally :

gst-launch uvch264_src device=/dev/video1 name=src auto-start=true src.vfsrc ! queue ! “video/x-raw-yuv,width=320,height=240,framerate=30/1” ! xvimagesink src.vidsrc ! queue ! video/x-h264,width=1920,height=1080,framerate=30/1,profile=constrained-baseline ! h264parse ! ffdec_h264 ! xvimagesink

That’s about it. If you are interested in using uvch264_src to capture from one of the UVC H264 encoding cameras, make sure you upgrade to the latest git versions of gstreamer, gst-plugins-base, gst-plugins-good and gst-plugins-bad (or 0.10.37+ for gstreamer/gst-plugins-base, 0.10.32 for gst-plugins-good and 0.10.24 for gst-plugins-bad, whenever those versions get released).

I would like to thank Collabora and Cisco for sponsoring me to work on this great project, it couldn’t have been possible without them!

If you have any more questions about this subject, feel free to contact me.

Enjoy!

ExpoLibre 2011 in Talca/Chile

Hi all,

About 2 weeks ago, I was in Talca, Chile for the ExpoLibre 2011 conference. It was really awesome, I had one of the best experiences as a speaker!

One of the particularities of that conference, is that it’s organized by the university and its target audience is students, teachers and enthusiasts in open source. The majority of the attendees were not open source developers, but they were people who wanted to learn more about it.
For that reason, this was my very first “motivational talk” rather than my usual technical talks that I’ve given in the past, and I loved it!

Another interesting point was that the audience was mostly speaking Spanish, and not everyone understood English, so I had my colleagues (Reynaldo Verdejo and Thibault Saunier) there to translate what I was saying. That created a very pleasant experience as I had time to relax between each slide while they were translating, and it also made the talk more casual and interactive. I wasn’t nervous for the first time, and it felt great! 🙂

After the talk, I received some very interesting questions and I thoroughly enjoyed answered everyone of them. I saw a lot of people who were interested and I felt like I connected with everyone and I was able to touch them with my ideas. If I was able to change at least one attendee’s perception of open source, and hopefully get them involved in various FLOSS projects, then my mission is a success!

Today, the organizers of the ExpoLibre conference sent me the video recording of my talk, and I’ve shared it on youtube so everyone can listen to what I had stay. I hope everyone enjoys it as much as I enjoyed doing it.

On a final note, I’d like to say that Chile is a beautiful country. I stayed there for almost two weeks, and even though travel from/to Canada is a pain, it was totally worth it! I can’t wait for the next opportunity for me to go there.

Update : Some people complained about the rhythm being broken because of the translation to spanish,  so I asked here for anyone who wants to contribute, to edit the video and crop the non-english sections, so english-only speaking people can view the talk in one constant rhythm/flow without the interruptions by the translators.

Patrick Donnelly, one of the people who saw the video (and my request for an edit) did it and commented below  with a link to an english-only version of my talk (the intro and questions part were left untouched at my request). Here it is for those who need it :

 

And here is the original, unedited version of my talk I gave, enjoy it!

Ps: The video I tried to show to the audience (around 6:30) which did not work, was this one : http://www.youtube.com/watch?v=20ClL3mL8Gc

And here are the slides used during the talk, in PDF format : http://people.collabora.co.uk/~kakaroto/expolibre-2011.pdf 

 

KaKaRoTo

 

aMSN 0.98.2 to be released without Audio/Video support

Hey,

We (the aMSN team) want to soon release aMSN version 0.98.2 because of the various issues we faced with 0.98.1 because Microsoft changed their protocols.. mainly the problem with the nickname changing to the email at every connection.

We also recently realized that the Audio/Video capability would need to be dropped because it just doesn’t work anymore unfortunately 🙁

Microsoft forced everyone to upgrade to the latest version of Windows Live Messenger : WLM 2009 which uses MSNP18 and has the ability to do Audio/Video calls through a tunneled SIP (P2P SIP messages, not using an external SIP server) and so, they realized that their SIP servers aren’t being used anymore, so they had this great idea of shutting down their servers… the result? aMSN can’t do Audio/Video calls because the SIP server refuses our connections.

Now we have two choices, either use MSNP18, or disable SIP calls completely. Obviously, we can’t move to MSNP18 at the moment, because MPOP support isn’t really ready yet, and because MSNP2PV2 hasn’t been implemented, so we had to remove the Audio/Video support in aMSN SVN for now, and only have it available for those who are brave enough to try out MSNP18.

We will soon release 0.98.2 as it is long overdue now, and we want to let people know why their A/V calls are not available to them anymore.

I hope everybody will understand why this is happening and we won’t get flooded by n00b questions everyday…