Author Archives: Edward Fernandez

Google Pixel 7 and Pixel 7 Pro: The next evolution in mobile security

Every day, billions of people around the world trust Google products to enrich their lives and provide helpful features – across mobile devices, smart home devices, health and fitness devices, and more. We keep more people safe online than anyone else in the world, with products that are secure by default, private by design and that put you in control. As our advancements in knowledge and computing grow to deliver more help across contexts, locations and languages, our unwavering commitment to protecting your information remains.

That’s why Pixel phones are designed from the ground up to help protect you and your sensitive data while keeping you in control. We’re taking our industry-leading approach to security and privacy to the next level with Google Pixel 7 and Pixel 7 Pro, our most secure and private phones yet, which were recently recognized as the highest rated for security when tested among other smartphones by a third-party global research firm.1

Pixel phones also get better every few months with Feature Drops that provide the latest product updates, tips and tricks from Google. And Pixel 7 and Pixel 7 Pro users will receive at least five years of security updates2, so your Pixel gets even more secure over time.

Your protection, built into Pixel

Your digital life and most sensitive information lives on your phone: financial information, passwords, personal data, photos – you name it. With Google Tensor G2 and our custom Titan M2 security chip, Pixel 7 and Pixel 7 Pro have multiple layers of hardware security to help keep you and your personal information safe. We take a comprehensive, end-to-end approach to security with verifiable protections at each layer - the network, application, operating system and multiple layers on the silicon itself. If you use Pixel for your business, this approach helps protect your company data, too.

Google Tensor G2 is Pixel’s newest powerful processor custom built with Google AI, and makes Pixel 7 faster, more efficient and secure3. Every aspect of Tensor G2 was designed to improve Pixel's performance and efficiency for great battery life, amazing photos and videos.

Tensor’s built-in security core works with our Titan M2 security chip to keep your personal information, PINs and passwords safe. Titan family chips are also used to protect Google Cloud data centers and Chromebooks, so the same hardware that protects Google servers also secures your sensitive information stored on Pixel.

And, in a first for Google, Titan M2 hardware has now been certified under Common Criteria PP0084: the international gold standard for hardware security components also used for identity, SIM cards, and bankcard security chips.4 This means that the Titan M2 hardware meets the same rigorous protection guidelines trusted by banks, carriers, and governments.

To achieve the certification we went through rigorous third party lab testing by SGS Brightsight, a leading international security lab, and received certification against CC PP0084 with AVA_VAN.5 for the Titan M2 hardware and cryptography library from the Netherlands scheme for Certification in the Area of IT Security (NSCIB). Of all those numbers and acronyms the part we’re most proud of is that Titan hardware passed the highest level of vulnerability assessment (AVA_VAN.5) - the truest measure of resilience to advanced, methodical attacks.

This process took us more than three years to complete. The certification not only requires chip hardware to resist invasive penetration testing, but also mandates audits of the chip design and manufacturing process itself. The benefit for consumers? The now certified Titan M2 chip makes your phone even more resilient to sophisticated attacks.5

Private by design

Evolving our security and privacy standards to our fast-paced world requires new approaches as well. Earlier this year at I/O, we introduced Protected Computing, a toolkit of technologies that transforms how, when, and where personal data is processed to protect your privacy and security. Our approach focuses on:

  1. Minimizing your data footprint, by shrinking the amount of personally identifiable data altogether
  2. De-identifying data, with a range of anonymization techniques so it’s not linked to you
  3. Restricting data access using technologies like end-to-end encryption and secure enclaves.

Many elements of Protected Computing can be found on the new Pixel 7:

On Android, Private Compute Core keeps your information and AI-driven personalizations private with on-device processing. Data from features like Now Playing, Live Caption and Smart Reply in Messages are all processed on device and are never sent to Google to maintain your privacy. And even your device backups to the cloud are end-to-end encrypted using Titan in the cloud.6

With Google Tensor G2, Pixel’s advanced privacy protection also now covers audio data from events like cough and snore detection on Pixel 7.7 Audio data from cough and snore detection is never stored by or sent to Google to maintain your privacy.

On Pixel 7, Tensor G2 helps safeguard your system with the Android Virtualization Framework, unlocking improved security protections like enabling system update integrity checking to occur on-the-fly, reducing boot time after an update.

Extra protection when you’re online

Helping to keep you safe when you use your phone to browse the web and use apps is also critical. This is where a Virtual Private Network (VPN) comes in. A VPN helps protect your online activity from anyone who might try to access it by encrypting your network traffic to turn it into an unreadable format, and masking your original IP address. Typically, if you want a VPN on your phone, you need to get one from a third party.

To ensure more people have access to enhanced security, later this year, Pixel 7 and Pixel 7 Pro owners will be able to use VPN by Google One, at no extra cost.8 VPN by Google One is verifiably private, and will allow you to tap into Google’s world-class security for peace of mind when you connect online. With VPN by Google One, Pixel helps protect your online activity at a network level. Think of it like an extra layer of protection for your online security.

VPN by Google One creates a high-performance secure connection to the web so your browsing and app data is sent and received via an encrypted pathway. A few simple taps will activate the VPN to help keep your network traffic private from internet providers and hackers, giving you peace of mind when using cellular data, home Wi-Fi, and especially when connected to public networks, like a café or airport Wi-Fi. No need to worry about online intruders, hackers, or unsecure networks.

Unlike traditional VPN services, VPN by Google One uses Protected Computing to technically make it impossible for anyone at a network level, even VPN by Google One, to link your online traffic with your account or identity. VPN by Google One will be available at no extra cost as long as your phone continues to receive security updates. See here to learn more about VPN by Google One.

More protection and privacy with Android 13

Pixel 7 and Pixel 7 Pro have built-in anti-phishing protections from Android that scan for potential threats from phone calls, text messages and emails, and more anti-phishing protections enabled out-of-the-box than smartphones from leading competitors.9 In fact, Messages alone protects consumers against 1.5 billion spam messages per month.

Android also resets permissions for apps you haven’t used for an extended time. In a typical month, Android automatically resets more than 3 billion permissions affecting more than 1 billion installed apps. Similarly, if you use clipboard on Android 13, your history is automatically deleted after a period of time. This blocks apps running in the foreground from seeing old information that you previously copied.

You’re in control


Core to your safety is knowing that you’re in control. You always have control over your settings and devices across all of our products. With Android 13, coming soon through a Feature Drop, Pixel 7 and Pixel 7 Pro will give you additional ways to stay in control of your privacy and what you share with first and third-party apps. With Quick Settings, you can act on security issues as they arise, or review which apps are running in the background and easily stop them. You’ll have a single destination for reviewing your security and privacy settings, risk levels and information, making it easier to manage your safety status.

With this new experience, you can review actionable steps to improve your safety status, like revoking a permission or app. This page will also have new action cards to notify you of any safety risks and provide timely recommendations on how to enhance your privacy. And with a single tap, you can grant or remove permissions to data that you don’t want to share with compatible apps. This will be coming soon first to Pixel devices later this year, and other Android phones soon after.

Verifiably secure

As computing extends to more devices and use cases, Google is committed to innovating in security and being transparent about the processes that we take to get there. We are leading the industry in verifiable security by not only having products that are tested against real-world threats (like advanced spam, phishing and malware attacks), but also in publishing the results of penetration tests, security audits, and industry certifications across our Pixel and Nest products.

Another way to verify our security is through our Android and Google Devices Security Reward Program where we reward security researchers who find vulnerabilities across products, including Pixel, Nest and Fitbit. Last year on Android, we awarded nearly $3 million dollars, creating a valuable feedback loop between us and the security research community and, most importantly, helping us keep our users safe.

To learn more about Pixel 7 and Pixel 7 Pro, check out the Google Store.

Notes


  1. Based on third-party global research firm. Evaluation considered features that may not be available in all countries. See here for more information.  

  2. Android version updates and feature drops for at least 3 years from when the device first became available on the Google Store in the US. Android security updates for at least 5 years from when the device first became available on the Google Store in the US. See g.co/pixel/updates for details. 

  3. Compared to Pixel 6. Speed and efficiency claims based on internal testing on pre-production devices.  

  4. Common Criteria certification for hardware and cryptographic library (CC PP0084 EAL4+, AVA_VAN.5 and ALC_DVS.2). See g.co/pixel/certifications for details. 

  5. Compared to Pixel 5a and earlier Pixel phones.  

  6. Excludes MMS attachments and Google Photos. 

  7. Not intended to diagnose, cure, mitigate, prevent or treat any disease or condition. Consult your healthcare professional if you have questions about your health. See g.co/pixel/digitalwellbeing for more information.  

  8. Coming soon. Restrictions apply. Some data is not transmitted through VPN. Not available in all countries. All other Google One membership benefits sold separately. This VPN offer does not impact price or benefits of Google One Premium plan. Use of VPN may increase data costs depending on your plan. See g.co/pixel/vpn for details. 

  9. Based on third-party research funded by Google LLC in June 2022. Evaluation based on no-cost smartphone features enabled by default. Some features may not be available in all countries. See here for more information. 

Use-after-freedom: MiraclePtr

Memory safety bugs are the most numerous category of Chrome security issues and we’re continuing to investigate many solutions – both in C++ and in new programming languages. The most common type of memory safety bug is the “use-after-free”. We recently posted about an exciting series of technologies designed to prevent these. Those technologies (collectively, *Scan, pronounced “star scan”) are very powerful but likely require hardware support for sufficient performance.

Today we’re going to talk about a different approach to solving the same type of bugs.

It’s hard, if not impossible, to avoid use-after-frees in a non-trivial codebase. It’s rarely a mistake by a single programmer. Instead, one programmer makes reasonable assumptions about how a bit of code will work, then a later change invalidates those assumptions. Suddenly, the data isn’t valid as long as the original programmer expected, and an exploitable bug results.

These bugs have real consequences. For example, according to Google Threat Analysis Group, a use-after-free in the ChromeHTML engine was exploited this year by North Korea.

Half of the known exploitable bugs in Chrome are use-after-frees:

Diving Deeper: Not All Use-After-Free Bugs Are Equal

Chrome has a multi-process architecture, partly to ensure that web content is isolated into a sandboxed “renderer” process where little harm can occur. An attacker therefore usually needs to find and exploit two vulnerabilities - one to achieve code execution in the renderer process, and another bug to break out of the sandbox.

The first stage is often the easier one. The attacker has lots of influence in the renderer process. It’s easy to arrange memory in a specific way, and the renderer process acts upon many different kinds of web content, giving a large “attack surface” that could potentially be exploited.

The second stage, escaping the renderer sandbox, is trickier. Attackers have two options how to do this:

  1. They can exploit a bug in the underlying operating system (OS) through the limited interfaces available inside Chrome’s sandbox.
  2. Or, they can exploit a bug in a more powerful, privileged part of Chrome - like the “browser” process. This process coordinates all the other bits of Chrome, so fundamentally has to be all-powerful.

We imagine the attackers squeezing through the narrow part of a funnel:

If we can reduce the size of the narrow part of the funnel, we will make it as hard as possible for attackers to assemble a full exploit chain. We can reduce the size of the orange slice by removing access to more OS interfaces within the renderer process sandbox, and we’re continuously working on that. The MiraclePtr project aims to reduce the size of the blue slice.

Here’s a sample of 100 recent high severity Chrome security bugs that made it to the stable channel, divided by root cause and by the process they affect.

You might notice:

  • This doesn’t quite add up to 100 - that’s because a few bugs were in other processes beyond the renderer or browser.
  • We claimed that the browser process is the more difficult part to exploit, yet there are more potentially-exploitable bugs! That may be so, but we believe they are typically harder to exploit because the attacker has less control over memory layout.

As you can see, the biggest category of bugs in each process is: V8 in the renderer process (JavaScript engine logic bugs - work in progress) and use-after-free bugs in the browser process. If we can make that “thin” bit thinner still by removing some of those use-after-free bugs, we make the whole job of Chrome exploitation markedly harder.

MiraclePtr: Preventing Exploitation of Use-After-Free Bugs

This is where MiraclePtr comes in. It is a technology to prevent exploitation of use-after-free bugs. Unlike aforementioned *Scan technologies that offer a non-invasive approach to this problem, MiraclePtr relies on rewriting the codebase to use a new smart pointer type, raw_ptr<T>. There are multiple ways to implement MiraclePtr. We came up with ~10 algorithms and compared the pros and cons. After analyzing their performance overhead, memory overhead, security protection guarantees, developer ergonomics, etc., we concluded that BackupRefPtr was the most promising solution.

The BackupRefPtr algorithm is based on reference counting. It uses support of Chrome's own heap allocator, PartitionAlloc, which carves out a little extra space for a hidden reference count for each allocation. raw_ptr<T> increments or decrements the reference count when it’s constructed, destroyed or modified. When the application calls free/delete and the reference count is greater than 0, PartitionAlloc quarantines that memory region instead of immediately releasing it. The memory region is then only made available for reuse once the reference count reaches 0. Quarantined memory is poisoned to further reduce the likelihood that use-after-free accesses will result in exploitable conditions, and in hope that future accesses lead to an easy-to-debug crash, turning these security issues into less-dangerous ones.

class A { ... };
class B {
B(A* a) : a_(a) {}
void doSomething() { a_->doSomething(); }
raw_ptr<A> a_; // MiraclePtr
};

std::unique_ptr<A> a = std::make_unique<A>();
std::unique_ptr<B> b = std::make_unique<B>(a.get());
[…]
a = nullptr; // The free is delayed because the MiraclePtr is still pointing to the object.
b->doSomething(); // Use-after-free is neutralized.

We successfully rewrote more than 15,000 raw pointers in the Chrome codebase into raw_ptr<T>, then enabled BackupRefPtr for the browser process on Windows and Android (both 64 bit and 32 bit) in Chrome 102 Stable. We anticipate that MiraclePtr meaningfully reduces the browser process attack surface of Chrome by protecting ~50% of use-after-free issues against exploitation. We are now working on enabling BackupRefPtr in the network, utility and GPU processes, and for other platforms. In the end state, our goal is to enable BackupRefPtr on all platforms because that ensures that a given pointer is protected for all users of Chrome.

Balancing Security and Performance

There is no free lunch, however. This security protection comes at a cost, which we have carefully weighed in our decision making.

Unsurprisingly, the main cost is memory. Luckily, related investments into PartitionAlloc over the past year led to 10-25% total memory savings, depending on usage patterns and platforms. So we were able to spend some of those savings on security: MiraclePtr increased the memory usage of the browser process 4.5-6.5% on Windows and 3.5-5% on Android1, still well below their previous levels. While we were worried about quarantined memory, in practice this is a tiny fraction (0.01%) of the browser process usage. By far the bigger culprit is the additional memory needed to store the reference count. One might think that adding 4 bytes to each allocation wouldn’t be a big deal. However, there are many small allocations in Chrome, so even the 4B overhead is not negligible. PartitionAlloc also uses pre-defined bucket sizes, so this extra 4B pushes certain allocations (particularly power-of-2 sized) into a larger bucket, e.g. 4096B->5120B.

We also considered the performance cost. Adding an atomic increment/decrement on common operations such as pointer assignment has unavoidable overhead. Having excluded a number of performance-critical pointers, we drove this overhead down until we could gain back the same margin through other performance optimizations. On Windows, no statistically significant performance regressions were observed on most of our top-level performance metrics like Largest Contentful Paint, First Input Delay, etc. The only adverse change there1 is an increase of the main thread contention (~7%). On Android1, in addition to a similar increase in the main thread contention (~6%), there were small regressions in First Input Delay (~1%), Input Delay (~3%) and First Contentful Paint (~0.5%). We don't anticipate these regressions to have a noticeable impact on user experience, and are confident that they are strongly outweighed by the additional safety for our users.

We should emphasize that MiraclePtr currently protects only class/struct pointer fields, to minimize the overhead. As future work, we are exploring options to expand the pointer coverage to on-stack pointers so that we can protect against more use-after-free bugs.

Note that the primary goal of MiraclePtr is to prevent exploitation of use-after-free bugs. Although it wasn’t designed for diagnosability, it already helped us find and fix a number of bugs that were previously undetected. We have ongoing efforts to make MiraclePtr crash reports even more informative and actionable.

Continue to Provide Us Feedback

Last but not least, we’d like to encourage security researchers to continue to report issues through the Chrome Vulnerability Reward Program, even if those issues are mitigated by MiraclePtr. We still need to make MiraclePtr available to all users, collect more data on its impact through reported issues, and further refine our processes and tooling. Until that is done, we will not consider MiraclePtr when determining the severity of a bug or the reward amount.

1 Measured in Chrome 99.

How Hash-Based Safe Browsing Works in Google Chrome

By Rohit Bhatia, Mollie Bates, Google Chrome Security

There are various threats a user faces when browsing the web. Users may be tricked into sharing sensitive information like their passwords with a misleading or fake website, also called phishing. They may also be led into installing malicious software on their machines, called malware, which can collect personal data and also hold it for ransom. Google Chrome, henceforth called Chrome, enables its users to protect themselves from such threats on the internet. When Chrome users browse the web with Safe Browsing protections, Chrome uses the Safe Browsing service from Google to identify and ward off various threats.

Safe Browsing works in different ways depending on the user's preferences. In the most common case, Chrome uses the privacy-conscious Update API (Application Programming Interface) from the Safe Browsing service. This API was developed with user privacy in mind and ensures Google gets as little information about the user's browsing history as possible. If the user has opted-in to "Enhanced Protection" (covered in an earlier post) or "Make Searches and Browsing Better", Chrome shares limited additional data with Safe Browsing only to further improve user protection.

This post describes how Chrome implements the Update API, with appropriate pointers to the technical implementation and details about the privacy-conscious aspects of the Update API. This should be useful for users to understand how Safe Browsing protects them, and for interested developers to browse through and understand the implementation. We will cover the APIs used for Enhanced Protection users in a future post.

Threats on the Internet

When a user navigates to a webpage on the internet, their browser fetches objects hosted on the internet. These objects include the structure of the webpage (HTML), the styling (CSS), dynamic behavior in the browser (Javascript), images, downloads initiated by the navigation, and other webpages embedded in the main webpage. These objects, also called resources, have a web address which is called their URL (Uniform Resource Locator). Further, URLs may redirect to other URLs when being loaded. Each of these URLs can potentially host threats such as phishing websites, malware, unwanted downloads, malicious software, unfair billing practices, and more. Chrome with Safe Browsing checks all URLs, redirects or included resources, to identify such threats and protect users.

Safe Browsing Lists

Safe Browsing provides a list for each threat it protects users against on the internet. A full catalog of lists that are used in Chrome can be found by visiting chrome://safe-browsing/#tab-db-manager on desktop platforms.

A list does not contain unsafe web addresses, also referred to as URLs, in entirety; it would be prohibitively expensive to keep all of them in a device’s limited memory. Instead it maps a URL, which can be very long, through a cryptographic hash function (SHA-256), to a unique fixed size string. This distinct fixed size string, called a hash, allows a list to be stored efficiently in limited memory. The Update API handles URLs only in the form of hashes and is also called hash-based API in this post.

Further, a list does not store hashes in entirety either, as even that would be too memory intensive. Instead, barring a case where data is not shared with Google and the list is small, it contains prefixes of the hashes. We refer to the original hash as a full hash, and a hash prefix as a partial hash.

A list is updated following the Update API’s request frequency section. Chrome also follows a back-off mode in case of an unsuccessful response. These updates happen roughly every 30 minutes, following the minimum wait duration set by the server in the list update response.

For those interested in browsing relevant source code, here’s where to look:

Source Code

  1. GetListInfos() contains all the lists, along with their associated threat types, the platforms they are used on, and their file names on disk.
  2. HashPrefixMap shows how the lists are stored and maintained. They are grouped by the size of prefixes, and appended together to allow quick binary search based lookups.

How is hash-based URL lookup done

As an example of a Safe Browsing list, let's say that we have one for malware, containing partial hashes of URLs known to host malware. These partial hashes are generally 4 bytes long, but for illustrative purposes, we show only 2 bytes.

['036b', '1a02', 'bac8', 'bb90']

Whenever Chrome needs to check the reputation of a resource with the Update API, for example when navigating to a URL, it does not share the raw URL (or any piece of it) with Safe Browsing to perform the lookup. Instead, Chrome uses full hashes of the URL (and some combinations) to look up the partial hashes in the locally maintained Safe Browsing list. Chrome sends only these matched partial hashes to the Safe Browsing service. This ensures that Chrome provides these protections while respecting the user’s privacy. This hash-based lookup happens in three steps in Chrome:

Step 1: Generate URL Combinations and Full Hashes

When Google blocks URLs that host potentially unsafe resources by placing them on a Safe Browsing list, the malicious actor can host the resource on a different URL. A malicious actor can cycle through various subdomains to generate new URLs. Safe Browsing uses host suffixes to identify malicious domains that host malware in their subdomains. Similarly, malicious actors can also cycle through various subpaths to generate new URLs. So Safe Browsing also uses path prefixes to identify websites that host malware at various subpaths. This prevents malicious actors from cycling through subdomains or paths for new malicious URLs, allowing robust and efficient identification of threats.

To incorporate these host suffixes and path prefixes, Chrome first computes the full hashes of the URL and some patterns derived from the URL. Following Safe Browsing API's URLs and Hashing specification, Chrome computes the full hashes of URL combinations by following these steps:

  1. First, Chrome converts the URL into a canonical format, as defined in the specification.
  2. Then, Chrome generates up to 5 host suffixes/variants for the URL.
  3. Then, Chrome generates up to 6 path prefixes/variants for the URL.
  4. Then, for the combined 30 host suffixes and path prefixes combinations, Chrome generates the full hash for each combination.

Source Code

  1. V4LocalDatabaseManager::CheckBrowseURL is an example which performs a hash-based lookup.
  2. V4ProtocolManagerUtil::UrlToFullHashes creates the various URL combinations for a URL, and computes their full hashes.

Example

For instance, let's say that a user is trying to visit https://evil.example.com/blah#frag. The canonical url is https://evil.example.com/blah. The host suffixes to be tried are evil.example.com, and example.com. The path prefixes are / and /blah. The four combined URL combinations are evil.example.com/, evil.example.com/blah, example.com/, and example.com/blah.

url_combinations = ["evil.example.com/", "evil.example.com/blah","example.com/", "example.com/blah"]
full_hashes = ['1a02…28', 'bb90…9f', '7a9e…67', 'bac8…fa']

Step 2: Search Partial Hashes in Local Lists

Chrome then checks the full hashes of the URL combinations against the locally maintained Safe Browsing lists. These lists, which contain partial hashes, do not provide a decisive malicious verdict, but can quickly identify if the URL is considered not malicious. If the full hash of the URL does not match any of the partial hashes from the local lists, the URL is considered safe and Chrome proceeds to load it. This happens for more than 99% of the URLs checked.

Source Code

  1. V4LocalDatabaseManager::GetPrefixMatches gets the matching partial hashes for the full hashes of the URL and its combinations.

Example

Chrome finds that three full hashes 1a02…28, bb90…9f, and bac8…fa match local partial hashes. We note that this is for demonstration purposes, and a match here is rare.

Step 3: Fetch Matching Full Hashes

Next, Chrome sends only the matching partial hash (not the full URL or any particular part of the URL, or even their full hashes), to the Safe Browsing service's fullHashes.find method. In response, it receives the full hashes of all malicious URLs for which the full hash begins with one of the partial hashes sent by Chrome. Chrome checks the fetched full hashes with the generated full hashes of the URL combinations. If any match is found, it identifies the URL with various threats and their severities inferred from the matched full hashes.

Source Code

  1. V4GetHashProtocolManager::GetFullHashes performs the lookup for the full hashes for the matched partial hashes.

Example

Chrome sends the matched partial hashes 1a02, bb90, and bac8 to fetch the full hashes. The server returns full hashes that match these partial hashes, 1a02…28, bb90…ce, and bac8…01. Chrome finds that one of the full hashes matches with the full hash of the URL combination being checked, and identifies the malicious URL as hosting malware.

Conclusion

Safe Browsing protects Chrome users from various malicious threats on the internet. While providing these protections, Chrome faces challenges such as constraints in memory capacity, network bandwidth usage, and a dynamic threat landscape. Chrome is also mindful of the users’ privacy choices, and shares little data with Google.

In a follow up post, we will cover the more advanced protections Chrome provides to its users who have opted in to “Enhanced Protection”.

DNS-over-HTTP/3 in Android

Posted by Matthew Maurer and Mike Yu, Android team

To help keep Android users’ DNS queries private, Android supports encrypted DNS. In addition to existing support for DNS-over-TLS, Android now supports DNS-over-HTTP/3 which has a number of improvements over DNS-over-TLS.

Most network connections begin with a DNS lookup. While transport security may be applied to the connection itself, that DNS lookup has traditionally not been private by default: the base DNS protocol is raw UDP with no encryption. While the internet has migrated to TLS over time, DNS has a bootstrapping problem. Certificate verification relies on the domain of the other party, which requires either DNS itself, or moves the problem to DHCP (which may be maliciously controlled). This issue is mitigated by central resolvers like Google, Cloudflare, OpenDNS and Quad9, which allow devices to configure a single DNS resolver locally for every network, overriding what is offered through DHCP.

In Android 9.0, we announced the Private DNS feature, which uses DNS-over-TLS (DoT) to protect DNS queries when enabled and supported by the server. Unfortunately, DoT incurs overhead for every DNS request. An alternative encrypted DNS protocol, DNS-over-HTTPS (DoH), is rapidly gaining traction within the industry as DoH has already been deployed by most public DNS operators, including the Cloudflare Resolver and Google Public DNS. While using HTTPS alone will not reduce the overhead significantly, HTTP/3 uses QUIC, a transport that efficiently multiplexes multiple streams over UDP using a single TLS session with session resumption. All of these features are crucial to efficient operation on mobile devices.

DNS-over-HTTP/3 (DoH3) support was released as part of a Google Play system update, so by the time you’re reading this, Android devices from Android 11 onwards1 will use DoH3 instead of DoT for well-known2 DNS servers which support it. Which DNS service you are using is unaffected by this change; only the transport will be upgraded. In the future, we aim to support DDR which will allow us to dynamically select the correct configuration for any server. This feature should decrease the performance impact of encrypted DNS.

Performance

DNS-over-HTTP/3 avoids several problems that can occur with DNS-over-TLS operation:

  • As DoT operates on a single stream of requests and responses, many server implementations suffer from head-of-line blocking3. This means that if the request at the front of the line takes a while to resolve (possibly because a recursive resolution is necessary), responses for subsequent requests that would have otherwise been resolved quickly are blocked waiting on that first request. DoH3 by comparison runs each request over a separate logical stream, which means implementations will resolve requests out-of-order by default.
  • Mobile devices change networks frequently as the user moves around. With DoT, these events require a full renegotiation of the connection. By contrast, the QUIC transport HTTP/3 is based on can resume a suspended connection in a single RTT.
  • DoT intends for many queries to use the same connection to amortize the cost of TCP and TLS handshakes at the start. Unfortunately, in practice several factors (such as network disconnects or server TCP connection management) make these connections less long-lived than we might like. Once a connection is closed, establishing the connection again requires at least 1 RTT.

    In unreliable networks, DoH3 may even outperform traditional DNS. While unintuitive, this is because the flow control mechanisms in QUIC can alert either party that packets weren’t received. In traditional DNS, the timeout for a query needs to be based on expected time for the entire query, not just for the resolver to receive the packet.

Field measurements during the initial limited rollout of this feature show that DoH3 significantly improves on DoT’s performance. For successful queries, our studies showed that replacing DoT with DoH3 reduces median query time by 24%, and 95th percentile query time by 44%. While it might seem suspect that the reported data is conditioned on successful queries, both DoT and DoH3 resolve 97% of queries successfully, so their metrics are directly comparable. UDP resolves only 83% of queries successfully. As a result, UDP latency is not directly comparable to TLS/HTTP3 latency because non-connection-oriented protocols have a different notion of what a "query" is. We have still included it for rough comparison.

Memory Safety

The DNS resolver processes input that could potentially be controlled by an attacker, both from the network and from apps on the device. To reduce the risk of security vulnerabilities, we chose to use a memory safe language for the implementation.

Fortunately, we’ve been adding Rust support to the Android platform. This effort is intended exactly for cases like this — system level features which need to be performant or low level (both in this case) and which would carry risk to implement in C++. While we’ve previously launched Keystore 2.0, this represents our first foray into Rust in Mainline Modules. Cloudflare maintains an HTTP/3 library called quiche, which fits our use case well, as it has a memory-safe implementation, few dependencies, and a small code size. Quiche also supports use directly from C++. We considered this, but even the request dispatching service had sufficient complexity that we chose to implement that portion in Rust as well.

We built the query engine using the Tokio async framework to simultaneously handle new requests, incoming packet events, control signals, and timers. In C++, this would likely have required multiple threads or a carefully crafted event loop. By leveraging asynchronous in Rust, this occurs on a single thread with minimal locking4. The DoH3 implementation is 1,640 lines and uses a single runtime thread. By comparison, DoT takes 1,680 lines while managing less and using up to 4 threads per DoT server in use.

Safety and Performance — Together at Last

With the introduction of Rust, we are able to improve both security and the performance at the same time. Likewise, QUIC allows us to improve network performance and privacy simultaneously. Finally, Mainline ensures that such improvements are able to make their way to more Android users sooner.

Acknowledgements

Special thanks to Luke Huang who greatly contributed to the development of this feature, and Lorenzo Colitti for his in-depth review of the technical aspects of this post.


  1. Some Android 10 devices which adopted Google Play system updates early will also receive this feature. 

  2. Google DNS and Cloudflare DNS at launch, others may be added in the future. 

  3. DoT can be implemented in a way that avoids this problem, as the client must accept server responses out of order. However, in practice most servers do not implement this reordering. 

  4. There is a lock used for the SSL context which is accessed once per DNS server, and another on the FFI when issuing a request. The FFI lock could be removed with changes to the C++ side, but has remained because it is low contention.