Author Archives: John G. Doe

YouTube creators meet Pope Francis to discuss promoting understanding and empathy

YouTube has helped millions of people see that we have a lot in common, despite our differences. Building these bridges can start with a simple conversation, and over the past 11 years, we’ve seen YouTube creators use the power of video to do just that. From Hayla Ghazal encouraging women in the Middle East and around the world to speak up, to Dulce Candy sharing her own story as an undocumented immigrant and military veteran, creators from around the world have used our platform to express themselves, encourage new perspectives, and inspire solidarity within global fan bases.

We want to continue empowering people to come to YouTube to tell stories and form connections that encourage empathy and understanding between diverse communities.

That’s why today 11 international YouTube creators met with Pope Francis, who cares deeply about bringing young people together. This first-of-its-kind dialogue took place during the VI Scholas World Congress, which the Pope created to encourage peace through real encounters with youth from different backgrounds.

The YouTube creators who participated in this conversation represent more than 27 million subscribers globally. They come from ten different countries and diverse religious backgrounds: Louise Pentland (United Kingdom), Lucas Castel (Argentina), Matemática Río (Brazil), Hayla Ghazal (United Arab Emirates), Dulce Candy (United States), Matthew Patrick (United States), Jamie and Nikki (Australia and Sudan/Egypt), Greta Menchi (Italy), Los Polinesios (Mexico) and anna RF (Israel).



During the conversation with the Pope, these creators raised topics that they are passionate about as role models, including immigrant rights, gender equality, loneliness and self-esteem, and greater respect for diversity of all kinds.

We’re inspired by the many conversations these creators have sparked throughout their YouTube journeys. To hear more about what they discussed at the Vatican today, tune in to each of their channels for personal videos in the coming weeks. We hope to continue helping people share their stories - the more we can all understand, the more we can come together as a global community.

Juniper Downs, Head of Policy for YouTube, recently watched “I am a Muslim, hug me if you trust me.”

Source: YouTube Blog


YouTube brings E3 to you wherever you are

Last year, millions of gamers came together on YouTube for the Electronic Entertainment Expo (E3). We watched as Bethesda unveiled “Fallout 4,” and many of us had our prayers answered when Square Enix shared a first look at the “Final Fantasy VII” remake. For more than a few gamers, E3 is the must-see gaming event of the year, an annual moment that defines the industry (and our collective wallets) for the coming 12 months.

YouTube will give gamers around the world front-row access to everything E3 again this year with dozens of live streams scheduled throughout the week. To make moments like this even better, we are launching event pages in YouTube Gaming, which serve as destinations for watching the biggest gaming and eSports events. Kicking things off, we’ll have a unique event hub for E3, available at gaming.youtube.com/e3 in June.

A home for live streams and on-demand videos, the YouTube Gaming E3 hub will put all our E3 coverage in one place. You’ll be able to browse videos and streams, chat live with fellow gamers, vote for your favorite trailers in the legendary Trailer Battle, and catch up on announcements you might have missed.

E3 Event Page Blog Mock.png


Coverage begins on Sunday, June 12, with early press conferences from EA and Bethesda. Stay tuned on YouTube after the press conferences because you won't want to miss post-coverage from our friends at Rooster Teeth - who will be streaming from the conference throughout the week.

The play-by-play coverage continues on Monday, June 13 at 9 a.m. PT, with YouTube Live at E3, an exclusive 12-hour stream hosted by Geoff Keighley and brought to you by Samsung. Live at E3 will bring you the live coverage of press conferences you crave along with developer interviews, live let’s plays of new games you won’t find anywhere else, and surprise guests like MatPat, iHasCupquake and CaptainSparklez.


Speaking about MatPat, if you’re in L.A. before E3, make sure to stop by the YouTube Space LA for a fan event on June 8 and a peek at MatPat's Game Lab. RSVP here.

See you at E3!

Ryan Wyatt, Global Head of Gaming Content, recently watched “Game Lab: TEST Games In Real Life then EXPERIENCE them in VR!

Source: YouTube Blog


A YouTube experience like nothing you’ve seen before

Ever wish you could swim with sharks, ride in an Indy Car, or go on a world tour? Well, later this year you’ll be able to experience these events and more as if you were really there with the YouTube VR app.

For more than a year, we’ve been adding support for new video and audio formats on YouTube like 360-degree video, VR video and Spatial Audio. These were the first steps on our way toward a truly immersive video experience, and now we're taking another one with the YouTube VR app for Daydream, Google's platform for high-quality mobile virtual reality, announced today at Google I/O.

We’re creating the YouTube VR app to provide an easier, more immersive way to find and experience virtual reality content on YouTube. It also comes with all the YouTube features you already love, like voice search, discovery, and playlists, all personalized for you, so you can experience the world's largest collection of VR videos in a whole new way.

And thanks to the big, early bet we made on 360-degree and 3-D video, you will be able to see all of YouTube’s content on the app—everything from classic 16x9 videos to 360-degree footage to cutting-edge VR experiences in full 3-D. Whether you want a front row seat to your favorite concert, access to the best museums in the world, or a midday break from work watching your favorite YouTube creator, YouTube VR will have it all.

To bring even more great VR content onto YouTube, we’ve been working with some amazing creators to experiment with new formats that offer a wide range of virtual experiences. We’re already collaborating with the NBA, BuzzFeed and Tastemade to explore new ways of storytelling in virtual environments that will provide valuable lessons about the way creators and viewers interact with VR video. Stay tuned!

We’ve also been working with camera partners to make Jump-ready cameras, such as the GoPro Odyssey, available to creators, to help make the production of VR video more accessible. And today, we’re officially launching our Jump program at the YouTube Spaces in L.A. and NYC and we will it bring to all YouTube Space locations around the globe soon.

We’re just beginning to understand what a truly immersive VR experience can bring to fans of YouTube, but we’re looking forward to making that future a (virtual) reality.

Kurt Wilms, Senior Product Manager, YouTube Virtual Reality, recently watched "Insane 360 video of close-range tornado near Wray, CO yesterday!"

Source: YouTube Blog


Machine learning for video transcoding

At YouTube we care about the quality of the pixels we deliver to our users. With many millions of devices uploading to our servers every day, the content variability is so huge that delivering an acceptable audio and video quality in all playbacks is a considerable challenge. Nevertheless, our goal has been to continuously improve quality by reducing the amount of compression artifacts that our users see on each playback. While we could do this by increasing the bitrate for every file we create, that would quite easily exceed the capacity of many of the network connections available to you. Another approach is to optimize the parameters of our video processing algorithms to meet bitrate budgets and minimum quality standards. While Google’s compute and storage resources are huge, they are finite and so we must temper our algorithms to also fit within compute requirements. The hard problem then is to adapt our pipeline to create the best quality output for each clip you upload to us, within constraints of quality, bitrate and compute cycles.


This is a well known triad in the world of video compression and transcoding. The problem is usually solved by finding a sweet spot of transcoding parameters that seem to work well on average for a large number of clips. That sweet spot is sometimes found by trying every possible set of parameters until one is found that satisfies all the constraints. Recently, others have been using this “exhaustive search” idea to tune parameters on a per clip basis.


What we’d like to show you in this blog post is a new technology we have developed that adapts our parameter set for each clip automatically using Machine Learning. We’ve been using this over the last year for improving the quality of movies you see on YouTube and Google Play.


The good and bad about parallel processing



We ingest more than 400 hours of video per minute. Each file must be transcoded from the uploaded video format into a number of other video formats with different codecs so we can support playback on any device you might have. The only way we can keep up with that rate of ingest and quickly show you your transcoded video in YouTube is to break each file in pieces called “chunks,” and process these in parallel. Every chunk is processed independently and simultaneously by CPUs in our Google cloud infrastructure. The complexity involved in chunking and recombining the transcoded segments is significant. Quite aside from the mechanics of assembling the processed chunks, maintaining the quality of the video in each chunk is a challenge. This is because to have as speedy a pipeline as possible, our chunks don’t overlap, and are also very small; just a few seconds. So the good thing about parallel processing is increased speed and reduced latency. But the bad thing is that without the information about the video in the neighboring chunks, it’s now difficult to control chunk quality so that there is no visible difference between the chunks when we tape them back together. Small chunks don’t give the encoder much time to settle into a stable state hence each encoder treats each chunk slightly differently.

Smart parallel processing



You could say that we are shooting ourselves in the foot before starting the race. Clearly, if we communicate information about chunk complexity between the chunks, each encoder can adapt to what’s happening in the chunks after or before it. But inter-process communication increases overall system complexity and requires some extra iterations in processing each chunk.


Actually, OK, truth is we’re stubborn here in Engineering and we wondered how far we could push this idea of “don’t let the chunks talk to each other.”


The plot below shows an example of the PSNR in dB per frame over two chunks from a 720p video clip, using H.264 as the codec. A higher value of PSNR means better picture quality and a lower value means poorer quality. You can see that one problem is the quality at the start of a chunk is very different from that at the end of the chunk. Aside from the average quality level being worse than we would like, this variability in quality causes an annoying pulsing artifact.


Because of small chunk sizes, we would expect that each chunk behaves like the previous and next one, at least statistically. So we might expect the encoding process to converge to roughly the same result across consecutive chunks. While this is true much of the time, it is not true in this case. One immediate solution is to change the chunk boundaries so that they align with high activity video behaviour like fast motion, or a scene cut. Then we would expect that each chunk is relatively homogenous so the encoding result should be more uniform. It turns out that this does improve the situation, but not as much as we’d like, and the instability is still often there.


The key is to allow the encoder to process each chunk multiple times, learning on each iteration how to adjust its parameters in anticipation of what happens in across the entire chunk instead of just a small part of it. This results in the start and end of each chunk having similar quality, and because the chunks are short, it is now more likely that the differences across chunk boundaries are also reduced. But even then, we noticed that it can take quite a number of iterations for this to happen. We observed that the number of iterations is affected a great deal by the quantization related parameter (CRF) of the encoder on that first iteration. Even better, there is often a “best” CRF that allows us to hit our target bitrate at a desired quality with just one iteration. But this “best” setting is actually different for every clip. That’s the tricky bit. If only we could work out what that setting was for each clip, then we’d have a simple way of generating good looking clips without chunking artifacts.


The plot on the right shows the result of many experiments with our encoder at varying CRF (constant quality) settings, over the same 1080p clip. After each experiment we measured the bitrate of the output file and each point shows the CRF, bitrate pair for that experiment. There is a clear relationship between these two values. In fact it is very well modeled as an exponential fit with three3 parameters, and the plot shows just how good that modeled line is in fitting the observed data points. If we knew the parameters of the line for our clip, then we’d see that to create a 5 Mbps version of this clip (for example) we’d need a CRF of about 20.

Pinky and the Brain



What we needed was a way to predict our three3 curve fitting parameters from low complexity measurements about the video clip. This is a classic problem in machine learning, statistics and signal processing. The gory mathematical details of our solution are in technical papers that we published recently.1 You can see there how our thoughts evolved. Anyway, the idea is rather simple: predict the three3 parameters given things we know about the input video clip, and read off the CRF we need. This prediction is where the “Google Brain” comes in.


The “things we know about the input video clip” are called video “features.” In our case there are a vector of features containing measurements like input bit rate, motion vector bits in the input file, resolution of the video and frame rate. These measurements can also be made from a very fast low quality transcode of the input clip to make them more informative. However, the exact relationship between the features and the curve parameters for each clip is rather more complicated than an equation we could write down. So instead of trying to discover that explicitly ourselves, we turned to mMachine lLearning with Google Brain. We first took about 10,000 video clips and exhaustively tested every quality setting on each, measuring the resulting bitrate from each setting. This gave us 10,000 curves which in turn gave us 4 x 10,000 parameters measured from those curves.


The next step was to extract features from our video clips. Having generated the training data and the feature set, our Machine Learning system learned a “Brain” configuration that could predict the parameters from the features. Actually we used both a simple “regression” technique as well as the Brain. Both outperformed our existing strategy.  Although the process of training the Brain is relatively computationally heavy, the resulting system was actually quite simple and required only a few operations on our features. That meant that the compute load in production was small.


Does it work?

The plot on the right shows the performance of the various systems on 10,000 video clips. Each point (x,y) represents the percentage of clips (y-axis) in which the resulting bitrate after compression is within x% of the target bitrate. The blue line shows the best case scenario where we use exhaustive search to get the perfect CRF for each clip. Any system that gets close to that is a good one. As you can see at the 20% rate, our old system (green line) would hit the target bitrate 15% of the time. Now with our fancy Brain system we can hit it 65% of the time if we use features from your upload only (red line), and better than 80% of the time (dashed line) using some features from a very fast low quality transcode.
nn_wp_prediction.png


But does this actually look good? You may have noticed that we concentrated on our ability to hit a particular bitrate rather than specifically addressing picture quality. Our analysis of the problem showed that this was the root cause. Pictures are the proof of the pudding and you can see some frames from a 720p video clip below (shot from a racing car). The top row shows two frames at the start and end of a typical chunk and you can see that the quality in the first frame is way worse than the last. The bottom row shows the frames in the same chunk using our new automated clip adaptive system. In both cases the measured bitrate is the same at 2.8 Mbps. As you can see, the first frame is much improved and as a bonus the last frame looks better as well. So the temporal fluctuation in quality is gone and we also managed to improve the clip quality overall.




This concept has been used in production in our video infrastructure division for about a year. We are delighted to report it has helped us deliver very good quality streams for movies like "Titanic" and most recently "Spectre." We don’t expect anyone to notice, because they don’t know what it would look like otherwise.

But there is always more we can do to improve on video quality. We’re working on it. Stay tuned.

Anil Kokaram, Engineering Manager, AV Algorithms Team, recently watched "Tony Cozier speaking about the West Indies Cricket Heritage Centre," Yao Chung Lin, Software Engineer, Transcoder Team, recently watched "UNDER ARMOUR | RULE YOURSELF | MICHAEL PHELPS," Michelle Covell, Research Scientist, recently watched "Last Week Tonight with John Oliver: Scientific Studies (HBO)" and Sam John, Software Engineer, Transcoder Team, recently watched "Atlantis Found: The Clue in the Clay | History."



1Optimizing transcoder quality targets using a neural network with an embedded bitrate model, Michele Covell, Martin Arjovsky, Yao-Chung Lin and Anil Kokaram, Proceedings of the Conference on Visual Information Processing and Communications 2016, San Francisco
Multipass Encoding for reducing pulsing artefacts in cloud based video transcoding, Yao-Chung Lin, Anil Kokaram and Hugh Denman, IEEE International Conference on Image Processing, pp 907-911, Quebec 2015

Because retro is in — announcing historical data in the YouTube Reporting API

YouTube creators rely on data -- data about how their channel is performing, data about their video’s ratings, their earnings. Lots of data. That’s why we launched the YouTube Reporting API back in October, which helps you bulk up your data requests while keeping them on a low-quota diet.

Reports made with the API started from the day you scheduled them, going forward. Now that it’s been in the wild, we’ve heard another request loud and clear: you don’t just want current data, you want older data, too. We’re happy to announce that the Reporting API now delivers historical data covering 180 days prior to when the reporting job is first scheduled (or July 1st, 2015, whichever is later.)

Developers with a keen eye may have already noticed this, as it launched a few weeks ago! Just in case you didn’t, you can find more information on how historical data works by checking out the Historical Data section of the Reporting API docs.

(Hint: if you’ve already got some jobs scheduled, you don’t need to do anything! We’ll generate the data automatically.)

New to the Reporting API? Tantalized by the possibility of all that historical data? Our documentation explains everything you need to know about scheduling jobs and the types of reports available. Try it out with our API Explorer, then dive into the sample code or write your own with one of our client libraries.

Happy reporting,

YouTube Developer Relations on behalf of Alvin Cham, Markus Lanthaler, Matteo Agosti, and Andy Diamondstein

Highlights from tonight’s YouTube Brandcast event: What’s next for Google Preferred

Tonight, YouTube hosted our fifth annual Brandcast in New York, our digital Newfronts event, where we showcase the best of YouTube to advertisers and agencies. With a little help from Big Bird, the Whip Nae Nae, and Sia, we showed that YouTube is where the world chooses to watch: on mobile alone, we now reach more 18-49 year-olds in the U.S. than any broadcast or cable TV network.1

As a result, our influence is greater than ever. In fact, nearly 60 percent of YouTube subscribers say they’d follow a YouTube creator’s advice on what to buy over that of their favorite TV or movie star. And the influence of YouTube stars continues to grow—last year, 6 in 10 of the most influential celebrities among teens were YouTubers, and this year that number jumped to 8 in 10.

To help brands capture some of this momentum, we have a premium content offering called Google Preferred that features creators like Lilly Singh, sWooZie and Laura Vitale. Organized into 13 lineup categories—including Comedy, Music, and Food & Recipes—Google Preferred is easy to buy and gives advertisers access to the most-loved and most-watched content on YouTube. In fact, Google Preferred reaches more people in the U.S. on mobile than all full episode players combined.2 And, we announced at Brandcast that Google Preferred just got better. Here’s how:

1. More Can’t-Miss Moments With Google Preferred Breakout Videos

Pop culture moments like the Harlem Shake and Let It Go are as unexpected as they are popular. While Google Preferred gives brands access to established creator channels, our new Breakout Videos offering lets them advertise on emerging content—the hottest and fastest-rising videos on YouTube. With the launch of Google Preferred Breakout Videos in the U.S., brands can be there alongside the next breakout star.

2. More Ease and Relevance with Programmatic Guaranteed for Google Preferred

To make it even easier to be there in the moment viewers are watching, upfront buyers will soon be able to execute their Google Preferred—and Breakout Videos—buy programmatically through DoubleClick Bid Manager. This means all video campaigns (including TrueView, Google Preferred, and cross-exchange) can be managed in one place.

3. More Sports Fans With NBA Highlights on Google Preferred

NBA Commissioner Adam Silver also took to the Brandcast stage to announce that NBA inventory is now part of Google Preferred. Top NBA highlights will surface in Google Search and on YouTube, offering brands access to the NBA’s loyal fan base across screens. In addition, the NBA will soon launch two new VR series on YouTube that give viewers a behind-the-scenes look at team arenas and their favorite players.

4. More Great Results for Brands

With new additions like Breakout Videos, Programmatic Guaranteed, and NBA highlights, Google Preferred will keep delivering massive reach, and strong results for brands.

Already this year, among the Google Preferred campaigns we measured, 75 percent drove lifts in consideration3 and 61 percent drove lifts in favorability.4 Google Preferred is driving results later in the consumer journey, too: it raised purchase intent in two-thirds of campaigns.5

Results like these have encouraged nearly twice the number of brands in the U.S. to take advantage of Google Preferred in the last year.6

Chime in with your Google Preferred questions and comments on #Brandcast, and check out six questions every brand should consider after Brandcast. We’ll post even more content after the show on the Brandcast website.

Kate Stanford, Head of Global Advertiser Marketing at YouTube, recently watched "Imagine the Possibilities."

1Google-commissioned Nielsen study U.S., December 2015. Audience Reach among Persons 18-49 for YouTube (mobile only) and 124 individual U.S. cable and broadcast networks (television only).

2Google/Millward Brown Digital Google Preferred Mobile Clickstream Analysis US, October 2015 (Full episode players on mobile browser or app include: Netflix, HBO, Hulu, NBC, Fox, CBS, ABC, sourced from the 20,000 person Mobile Compete Clickstream Panel).

3Google Preferred Brand Lift Meta Analysis, 2015 and Q1 2016. Results for 242 US Google Preferred Consideration Studies.

4Google Preferred Brand Lift Meta Analysis, 2015 and Q1 2016. Results for 44 US Google Preferred Favorability Studies.

5Google Preferred Brand Lift Meta Analysis, 2015 and Q1 2016. Results for 80 US Google Preferred Purchase Intent studies.

6Google data, US, Q1 2016 vs Q1 2015.

Source: YouTube Blog


Announcing the Mobile Data Plan API

More than half of YouTube watch time happens on mobile devices, with a large and rapidly increasing fraction of this time spent on cellular networks. At the same time, it is common for users to have mobile data plans with usage limits. Users who exhaust their quota can incur overage charges, have their data connections turned off and speeds reduced. When this happens, application performance suffers and user satisfaction decreases.

At the root of this problem lies the fact that users do not have an easy manner to share data plan information with an application, and, in turn, applications cannot optimize the user’s experience. In an effort to address this limitation we have worked with a few partners in the mobile ecosystem to specify an API that improves data transparency.

At a high level, the API comprises two parts. First, a mechanism for applications to establish an anonymous identifier of the user’s data plan. This new, Carrier Plan Identifier (CPID), protects the user’s identity and privacy. Second, a mechanism that allows applications, after establishing a CPID, to request information about the user’s data plan from the mobile network operator (MNO). Applications communicate with MNOs using HTTPS and the API encodes data plan information in an extensible JSON-based format.

We believe the API will improve transparency and Quality of Experience (QoE) for mobile applications such as YouTube. For example, the cost of data can depend on the time of day, where users get discounts for using the network during off-peak hours. For another example consider that while users with unlimited data plans may prefer high resolution videos, users who are about to exceed their data caps or are in a busy network may be better served by reduced data rate streams that extend the life of the data plan while still providing good quality.

Cellular network constraints are even more acute in countries where the cost of data is high, users have small data budgets, and networks are overutilized. With more than 80% of views from outside the United States, YouTube is the first Google application conducting field trials of the Mobile Data Plan API in countries, such as Malaysia, Thailand, the Philippines and Guatemala, where these characteristics are more prominent. These trials aim to bring data plan information as an additional real-time input to YouTube’s decision engine tuned to improve QoE.

We believe the same data plan information will lay the foundation for other applications and mobile operators to innovate together. This collaboration can make data usage more transparent to users, incentivize efficient use of mobile networks, and optimize user experience.

We designed the API in cooperation with a number of key partners in the mobile ecosystem, including Telenor Group, Globe Telecom and Tigo, all of which have already adopted and implemented this API. Google also worked with Ericsson to support the Mobile Data Plan API in their OTT Cloud Connect platform. We invite other operators and equipment vendors to implement this solution and offer applicable products and services to their customers.

The Mobile Data Plan API specification is available from this link. We are looking forward to your comments and we are available at: [[email protected]].

Posted by Andreas Terzis, technical lead at Google Access, & Jessica Xu, product manager at YouTube.

A new mobile design for your Home

Whether you want to watch hilarious sketch comedy, your favorite vlogger, new let’s plays, or music videos, you should be able to see new videos you love every time you visit YouTube—right on your homepage. Starting today, when you open the YouTube app on your iPhone or Android phone, you’ll experience a redesigned Home, with a clean and simple format that invites you to discover and enjoy.

large-thumbs-creators-app-bar@2x.png


Large, high resolution images make it easy to identify videos you want to watch, and a prominent icon highlights the creator for every video

This isn’t just a new coat of paint on the same old Home—we’ve coupled a fresh design with more relevant personalized recommendations that make it easier to discover videos you’ll be excited to watch. The new recommendation system is based on deep neural network technology, which means it can find patterns automatically and keep learning and improving as it goes. Every day, we recommend hundreds of millions of different videos on Home, billions of times, in 76 languages.

One of the biggest improvements is how the system suggests more recent videos and those from the creators you love. People who have tried the new system have spent more time watching fresh videos and content from their Subscriptions.

We hope you’ll enjoy your new Home, built just for you!

Brian Marquardt, Product Manager, YouTube Main App, recently watched "Troye Sivan: Youth" and Todd Beaupré, Product Manager, YouTube Discovery, recently watched "M83 - Do It, Try It."

Source: YouTube Blog


A look into YouTube’s video file anatomy

Over 1 billion people use YouTube, watching hundreds of millions of hours of content all over the world everyday. We have been receiving content at a rate exceeding 100 hours/min for the last three years (currently at 400 hours/min). With those kinds of usage statistics what we see on ingest actually says something about the state of video technology today.

Video files are the currency of video sharing and distribution over the web. Each file contains the video and audio data wrapped up in some container format and associated with metadata that describes the nature of the content in some way. To make sure each user can “Broadcast yourself” we have spent years building systems that can faithfully extract the video and audio data hidden inside almost any kind of file you can imagine. That is why when our users upload to YouTube they have confidence that their video and audio will always appear.

The video and audio data is typically compressed using a codec and of course the data itself comes in a variety of resolutions, frame rates, sample rates and channels (in the case of audio). As technology evolves, codecs get better, and the nature of the data itself changes, typically toward higher fidelity. But how much variety is there in this landscape and how has that variety changed with time? We’ve been analyzing the anatomy of files you’ve been uploading over the years and think it reflects how video technology has changed.

Audio/video file anatomy

Audio/video files contain audio and video media which can be played or viewed on some multimedia devices like a TV or desktop or smartphone. Each pixel of video data is associated with values for brightness and color which tells the display how that pixel should appear. A quick calculation on the data rate for the raw video data shows that for 720p video at 30 frames per second the data rate is in excess of 420 Mbits/sec. Raw audio data rates are smaller but still significant at about 1.5 MBits/sec for 44.1 KHz sampling with 16 bits per sample. These rates are well in excess of the 10’s of MBits/sec (at most) that many consumers have today. By using compression technology that same > 400 MBits/sec of data can be expressed in less than 5 Mbits/sec. This means that audio and video compression is a vital part of any practical media distribution system. Without compression we would not be able to stream media over the internet in the way everyone enjoys now.

There are three main components of media files today: the container, the compressed bitstream itself and finally metadata. The bitstream (called the video and audio “essence”) contains the actual audio and video media in a compressed form. It will also contain information about the size of the pictures and start and end of frames so that the codec knows how to decode the picture data in the right way. This information embedded in the bitstream is still not enough though. The “container” refers to the additional information that helps the decoder work out when a video frame is to be played, and when the audio data should be played relative to the frame. The container often also holds an index to the start of certain frames in the bitstream. This makes it easier for a player system to allow users to “seek” or “fast forward” through the contents. The container will also hold information about the file content itself like the author, and other kinds of “metadata” that could be useful for a rights holder or “menu” on a player for instance. So the bitstream contains the actual picture and audio, but the container lets the player know how that content should be played.

Standardization of containers and codecs was vital for the digital video industry to take off as it did in the late 1990s. The Motion Picture Experts Group (MPEG) was the consortium responsible and they are still active today. The interaction between containers and codecs has been so tight in the past that quite often the container and the codec might have the same name, because they arise from the same standards document. Needless to say, there are many different standards for the various components in a media file. Today we have MPEG and the Alliance for Open Media (AOM) emerging as the two major bodies engaged in creating new media compression and distribution technology. This is what makes the job of YouTube so challenging. We must correctly decode your content despite the endless variety, and despite the frequent errors and missing component in uploaded files. We deal with thousands of combinations of containers and codecs every week.

Containers

The plot below shows the percentage of files uploaded having the same container month on month over the last five years. Each container is associated with the same color over time. The legend is ordered from the bottom up. The container type used in the largest fraction of uploads is at the bottom.



In 2011, MP4 (.mp4), Audio Video Interleave (.avi), Flash Video (.flv), Advanced Systems Format (.asf) and MPEG Transport Stream (.ts) were more equally distributed than they are now. But over the years MP4 has overtaken them all to become the most common ingest container format. Open source formats like WebM and Matroska seem to have been slowly gaining in popularity since about 2012, which is when we started rolling out the open source VP9 codec. Windows Media files (using the .asf container) and Flash Video have declined significantly. On the other end of the scale, files using Creative Labs video containers (for instance), which were popular before 2011, are hardly ever seen in our ingest today.

Codecs

The history of ingested codec types reflects the speed with which new codecs are adopted by hardware manufacturers and the makers of software editing and conforming systems. The chart below looks at the top ten video codecs back in 2011 and reveals how they have fared since then in our ingest profile. The VP range of codecs (VP6 - VP8) do still figure in our ingest today and in fact compared to 2011, VP8 ranks seventh in our top ten in 2015. Clearly H.264 is the dominant codec we see in use for upload to YouTube now, but MPEG4 and Windows Media bitstreams are still significant. This is very different from the situation in 2011 when almost every codec had a significant share of our ingest profile. This reflects how heterogeneous the video compression landscape was five years ago, with no dominant compression technology. The chart shows how rapidly the ecosystem moves to adopt a compression technology as soon as it proves itself: just five years. Uploads from mobile devices have also driven this trend as efficient codec technology enables more uploads from low power devices with low bandwidth availability. In that time we have seen the almost complete erosion of Flash Video (FLV) and MPEG1/2 video for upload to YouTube, which all appear to have reached some kind of low volume steady state behavior in our ingest.



The situation with audio codecs follows similar trends. The chart below shows the top 15 codecs we see on ingest, measured over 2015. Five years ago we saw a very heterogeneous landscape with Raw audio data (PCM), Windows Media (WMA), MPEG and Advanced Audio (AAC) all contributing significant proportions. Over the intervening time the AAC codec has grown to dominate the profile of audio codecs, but PCM, WMA and MP3 are still significant. It's interesting that we see a pretty steady rate of media with no audio at all (shown as “No Audio”), although the total proportion is of course small. The use of the VORBIS open source audio codec got a boost in 2012 when the new version was released. Although it is hard to see from the chart, OPUS follows a similar pattern with uploads starting to kick off in late 2012 once the reference software was available and then a boost in uploads in 2013 coinciding with the next API release.



Properties

But what about the nature of video and audio media itself? Is there evidence to show that capture is increasing in resolution and color fidelity? This section reinforces the law that “in the internet everything gets bigger with time.”

Picture size

The chart below stacks the proportions of each resolution in our ingest against month. The legend shows the top ten resolutions by proportion of ingest as measured over the last year, with the topmost label being the largest proportion. There is always some disparity between “standard” picture sizes and the actual uploaded sizes. Those which do not fall into the labels used here are allocated to “OTHER.” Although the vast majority of our ingest shows standard picture sizes, that “OTHER” category has been persistently steady, showing that there will always be about 10 percent of our uploaders who upload non-standard sizes. The trend is clearly toward bigger pictures, with 480p dominating five years ago and HD (720p and 1080p) dominating now. It is interesting that we do not see step changes in behavior but rather a gradual acceleration to higher pixel densities. The 480p resolution does appear to be in a permanent decline however. 720p seems set to replace “vanilla” 480p in about a year.



With the 4K and 8K formats we see rapid take-up reflected in our ingest. The chart below breaks out just these two resolutions. Although understandably small as a proportion of the whole YouTube ingest profile, these formats are still significant and we notice that the take-up accelerated/spiked once announcements were made in 2013 (4K) and 2015 (8K). What is even more interesting is that the upload of 4K content started well before “official” announcement of the support. Our creators are always pushing the limits of our ingest and this is good evidence.



Audio channels

We observe that an increasing percentage of our media, which contain audio, contain stereo audio tracks as shown below in red. We also show here the relative amount of files having no audio (about 5 percent in 2015), and the trend is similar here as in the audio codec chart shown previously. A growing proportion of tracks contain 5.1 material but that is swamped by the amount of mono and stereo sound files. Making a linear prediction of the curves below would seem to imply that mono audio will decline to less than 5 percent of ingest in just over a year’s time.



Interlacing

Interlacing is still with us. This is the legacy TV broadcast practice of constructing a video frame from two half height images that record the odd and even lines of the final frame, but at slightly different times. The fraction of our content that is interlaced on upload appears to be roughly 2-3 percent averaged over the last five years and there is no sign of that actually dwindling. This is perhaps because of the small but significant made-for-TV content that is uploaded. The reasons for the observed rapid changes in some months are intriguing. One suggestion is correlation with unusually large volume TV coverage e.g. 2012 Olympics and the U.S. election.



Color spaces

We are continually working on our ability to reproduce color faithfully between ingest and display. This is a notoriously challenging task across the consumer display industry for TV’s, monitors and mobile devices. The first step to color nirvana is the correct specification of the color space in the associated video file. Although color space specifications have been in place for some time, it has taken a long while for file-based content to properly represent this data across a wide range of consumer devices. The chart below reflects our observations of the top five spaces we see. We started collecting information in 2012 and compared to the stability in codecs and containers, the specification of color spaces in video data is clearly still evolving. It is only in the last three years that we have started to observe more consistent color labeling of video data, and as the chart shows below, BT709 (the default color space for HD resolution) has emerged as the dominant color space definition. At the end of 2015 there was still an alarmingly large proportion of video files without any color information, more than 70 percent. Note that the vertical axis on the chart below starts from 70 percent. The trend in that proportion is downwards and if we extend our curve of the decline in unspecified color spaces it would appear that it will be about a year before we can expect to see the majority of files having some color specification, and two years for almost all files to contain that metadata. We have just started to observe files expressing the recent BT2020 color space, being ingested at the end of 2015. These of course account for a tiny proportion of ingest (< .005 percent). It does herald the start of the HDR technology rollout though (as BT2020 is a color space associated with that format) and reflects various announcements about HDR capable devices made at CES 2016.



Frame rates

The chart below shows how the use of a range of frame rates has actually not changed that much over time. As expected the U.S. and EU standards of 30 and 25Hz respectively, dominate the distribution. Less expected is that low frame rates of 15fps and lower also significantly impact our ingest. This is because of the relatively large proportion of educational material including slide shows, as well as music slide decks that are uploaded to YouTube. That sort of material tends to be captured at low frame rates. High frame rate (HFR) material (e.g. from 48Hz and upwards) is a steady flow especially since the announcement of HFR support in the YouTube player in 2014. Before 2014, the ceiling of our output target video was 30fps but since then we have raised the ceiling to 60fps. However the trend is not increasing as much as is say 1080p ingest itself. This possibly reflects bandwidth constraints on upload as well as the fact that most capture today especially on mobile devices still defaults to 25 or 30fps.



We continuously analyze both a wide angle and close up view of the video file activity worldwide. That has given us a unique perspective on the evolution of video technology. In a sense the data is a reflection of the consensus of device manufacturers and creators in the area of media capture and creation. So we can see the growing agreement around video codecs, frame rates and stereo audio. Color space specification is still very poor however, and some expected consensus have not emerged. For example in the area of HFR content creation, 60+ fps is not quite yet on a growth curve as HD resolution has been over the last year.

The data presented here show that even in the last five years the variability in data types and formats is reducing. However, as with many broadcasters and streaming sites we see enough variability in our ingested file profiles that we remain keen on standardization activities. We look forward to continuing engagement of the YouTube and Google engineering community in SMPTE, MPEG and AOM activities.

Even with the dominance of certain technologies like H.264/AAC codecs and the MOV type containers, there will always be a small but significant portion of audio video data that falls outside the “consensus.” These small proportions are important to us however, because we want you to be confident that we’re going to do our darndest to help you broadcast yourself no matter what device you use to make your clip.

Anil Kokaram, Tech Lead/Engineering Manager, AV Algorithms Team, recently watched "Carlos Brathwaite's 4 sixes," Thierry Foucu, Tech Lead Transcoder Team, recently watched "Sale of the Century," and Yang Hu, Software Engineer, recently watched "MINECRAFT: How to build wooden mansion."

New YouTube live features: live 360, 1440p, embedded captions, and VP9 ingestion

Yesterday at NAB 2016 we announced exciting new live and virtual reality features for YouTube. We’re working to get you one step closer to actually being in the moments that matter while they are happening. Let’s dive into the new features and capabilities that we are introducing to make this possible:

Live 360: About a year ago we announced the launch of 360-degree videos at YouTube, giving creators a new way to connect to their audience and share their experiences. This week, we took the next step by introducing support for 360-degree on YouTube live for all creators and viewers around the globe.

To make sure creators can tell awesome stories with virtual reality, we’ve been working with several camera and software vendors to support this new feature, such as ALLie and VideoStitch. Manufacturers interested in 360 through our Live API can use our YouTube Live Streaming API to send 360-degree live streams to YouTube.

Other 360-degree cameras can also be used to live stream to YouTube as long as they produce compatible output, for example, cameras that can act as a webcam over USB (see this guide for details on how to live stream to YouTube). Like 360-degree uploads, 360-degree live streams need to be streamed in the equirectangular projection format. Creators can use our Schedule Events interface to set up 360 live streams using this new option:

360_checkbox.png


Check out this help center page for some details.



1440p live streaming: Content such as live 360 as well as video games are best enjoyed at high resolutions and high frame rates. We are also announcing support of 1440p 60fps resolution for live streams on YouTube. Live streams at 1440p have 70 percent more pixels than the standard HD resolution of 1080p. To ensure that your stream can be viewed on the broadest possible range of devices and networks, including those that don’t support such high resolutions or frame rates, we perform full transcoding on all streams and resolutions. A 1440p60 stream gets transcoded to 1440p60, 1080p60 and 720p60 as well as all resolutions from 1440p30 down to 144p30.

Support for 1440p will be available from our creation dashboard as well as our Live API. Creators interested in using this high resolution should make sure that their encoder is able to encode at such resolutions and that they have sufficient upload bandwidth on their network to sustain successful ingestion. A good rule of thumb is to provision at least twice the video bitrate.

VP9 ingestion / DASH ingestion: We are also announcing support for VP9 ingestion. VP9 is a modern video codec that lets creators upload higher resolution videos with lower bandwidth, which is particularly important for high resolution 1440p content. To facilitate the ingestion of this new video codec we are also announcing support for DASH ingestion, which is a simple, codec agnostic HTTP-based protocol. DASH ingestion will support H.264 as well as VP9 and VP8. HTTP-based ingestion is more resilient to corporate firewalls and also allows ingestion over HTTPS. It is also a simpler protocol to implement for game developers that want to offer in game streaming support with royalty free video codecs. MediaExcel and Wowza Media Systems will both be demoing DASH VP9 encoding with YouTube live at their NAB booths.

We will soon publish a detailed guide to DASH Ingestion on our support web site. For developers interested in DASH Ingestion, please join this Google group to receive updates.

Embedded captions: To provide more support to broadcasters, we now accept embedded EIA-608/CEA-708 captions over RTMP (H.264/AAC). That makes it easier to send captioned video content to YouTube and no longer requires posting caption data over side-band channels. We initially offer caption support for streams while they are live and will soon support the transitioning of caption data to the live recordings as well. Visit the YouTube Help Center for more information on our live captioning support.



We first launched live streaming back in 2011, and we’ve live streamed memorable moments: 2012 Olympics, Red Bull Stratos Jump, League of Legends Championship, and Coachella Music Festival. We are excited to see what our community can create with these new tools!

Nils Krahnstoever, Engineering Manager for Live
Kurt Wilms, Senior Product Manager for VR and Live
Sanjeev Verma, Product Manager for Video Formats