Tag Archives: Performance

Performance Class helps Google Maps deliver premium experiences

Posted by Nevin Mital - Developer Relations Engineer, Android Media

The Android ecosystem features a diverse range of devices, and it can be difficult to build experiences that take advantage of new or premium hardware features while still working well for users on all devices. With Android 12, we introduced the Media Performance Class (MPC) standard to help developers better understand a device’s capabilities and identify high-performing devices. For a refresher on what MPC is, please see our last blog post, Using performance class to optimize your user experience, or check out the Performance Class documentation.

Earlier this year, we published the first stable release of the Jetpack Core Performance library as the recommended solution for more reliably obtaining a device’s MPC level. In particular, this library introduces the PlayServicesDevicePerformance class, an API that queries Google Play Services to get the most up-to-date MPC level for the current device and build. I’ll get into the technical details further down, but let’s start by taking a look at how Google Maps was able to tailor a feature launch to best fit each device with MPC.

Performance Class unblocks premium experience launch for Google Maps

Google Maps recently took advantage of the expanded device coverage enabled by the Play Services module to unblock a feature launch. Google Maps wanted to update their UI by increasing the transparency of some layers. Consequently, this meant they would need to render more of the map, and found they had to stop the rollout due to latency increases on many devices, especially towards the low-end. To resolve this, the Maps team started by slicing an existing key metric, “seconds to UI item visibility”, by MPC level, which revealed that while all devices had a small increase in this latency, devices without an MPC level had the largest increase.

A bar graph displays A/B test results for Seconds to UI item visibility, comparing control results with those using increased transparency across different Media Performance Class Levels.  A green horizontal line and text indicate the updated experience shipped to devices qualifying for MPC. A vertical green dotted line separates results for devices without a specific MPC level, which kept the previous UI.

With these results in hand, Google Maps started their rollout again, but this time only launching the feature on devices that report an MPC level. As devices continue to get updated and meet the bar for MPC, the updated Google Maps UI will be available to them as well.

The new Play Services module

MPC level requirements are defined in the Android Compatibility Definition Document (CDD), then devices and Android builds are validated against these requirements by the Android Compatibility Test Suite (CTS). The Play Services module of the Jetpack Core Performance library leverages these test results to continually update a device’s reported MPC level without any additional effort on your end. This also means that you’ll immediately have access to the MPC level for new device launches without needing to acquire and test each device yourself, since it already passed CTS. If the MPC level is not available from Google Play Services, the library will fall back to the MPC level declared by the OEM as a build constant.

A flowchart depicts the process of determining Performance Class levels for Android devices, involving manufacturers, CTS tests, a Grader, the Play Services module, and the CDD.

As of writing, more than 190M in-market devices covering over 500 models across 40+ brands report an MPC level. This coverage will continue to grow over time, as older devices update to newer builds, from Android 11 and up.

Using the Core Performance library

To use Jetpack Core Performance, start by adding a dependency for the relevant modules in your Gradle configuration, and create an instance of DevicePerformance. Initializing a DevicePerformance should only happen once in your app, as early as possible - for example, in the onCreate() lifecycle event of your Application. In this example, we’ll use the Google Play services implementation of DevicePerformance.

// Implementation of Jetpack Core library.
implementation("androidx.core:core-ktx:1.12.0")
// Enable APIs to query for device-reported performance class.
implementation("androidx.core:core-performance:1.0.0")
// Enable APIs to query Google Play Services for performance class.
implementation("androidx.core:core-performance-play-services:1.0.0")

import androidx.core.performance.play.services.PlayServicesDevicePerformance

class MyApplication : Application() {
  lateinit var devicePerformance: DevicePerformance

  override fun onCreate() {
    // Use a class derived from the DevicePerformance interface
    devicePerformance = PlayServicesDevicePerformance(applicationContext)
  }
}

Then, later in your app when you want to retrieve the device’s MPC level, you can call getMediaPerformanceClass():

class MyActivity : Activity() {
  private lateinit var devicePerformance: DevicePerformance
  override fun onCreate(savedInstanceState: Bundle?) {
    super.onCreate(savedInstanceState)
    // Note: Good app architecture is to use a dependency framework. See
    // https://developer.android.com/training/dependency-injection for more
    // information.
    devicePerformance = (application as MyApplication).devicePerformance
  }

  override fun onResume() {
    super.onResume()
    when {
      devicePerformance.mediaPerformanceClass >= Build.VERSION_CODES.UPSIDE_DOWN_CAKE -> {
        // MPC level 34 and later.
        // Provide the most premium experience for the highest performing devices.
      }
      devicePerformance.mediaPerformanceClass == Build.VERSION_CODES.TIRAMISU -> {
        // MPC level 33.
        // Provide a high quality experience.
      }
      else -> {
        // MPC level 31, 30, or undefined.
        // Remove extras to keep experience functional.
      }
    }
  }
}

Strategies for using Performance Class

MPC is intended to identify high-end devices, so you can expect to see MPC levels for the top devices from each year, which are the devices you’re likely to want to be able to support for the longest time. For example, the Pixel 9 Pro released with Android 14 and reports an MPC level of 34, the latest level definition when it launched.

You should use MPC as a complement to any existing Device Clustering solutions you already use, such as querying a device’s static specs or manually blocklisting problematic devices. An area where MPC can be a particularly helpful tool is for new device launches. New devices should be included at launch, so you can use MPC to gauge new devices’ capabilities right from the start, without needing to acquire the hardware yourself or manually test each device.

A great first step to get involved is to include MPC levels in your telemetry. This can help you identify patterns in error reports or generally get a better sense of the devices your user base uses if you segment key metrics by MPC level. From there, you might consider using MPC as a dimension in your experimentation pipeline, for example by setting up A/B testing groups based on MPC level, or by starting a feature rollout with the highest MPC level and working your way down. As discussed previously, this is the approach that Google Maps took.

You could further use MPC to tune a user-facing feature, for example by adjusting the number of concurrent video playbacks your app attempts based on the MPC level’s concurrent codec guarantees. However, make sure to still query a device’s runtime capabilities when using this approach, as they may differ depending on the environment and state the device is in.

Get in touch!

If MPC sounds like it could be useful for your app, please give it a try! You can get started by taking a look at our sample code or documentation. We welcome you to share any questions or feedback you have in this short form.


This blog post is a part of Camera and Media Spotlight Week. We're providing resources – blog posts, videos, sample code, and more – all designed to help you uplevel the media experiences in your app.

To learn more about what Spotlight Week has to offer and how it can benefit you, be sure to read our overview blog post.

Get your apps ready for 16 KB page size devices

Posted by Yacine Rezgui – Developer Relations Engineer, Steven Moreland – Staff Software Engineer

Android is evolving to deliver even faster, more performant experiences. One key improvement is the adoption of a 16 KB memory page size. This change enables the operating system to manage memory more efficiently, leading to noticeable performance gains (5-10%) in both apps and games. We provided an in-depth technical explanation and highlighted the performance improvements in Adding 16 KB Page Size to Android.

To help you test your app on 16 KB devices, this functionality is available as a developer option on Google Pixel 8 and 9 devices, and Samsung devices will soon offer similar support, as well as Xiaomi, vivo, and other Android OEMs.

To ensure compatibility with 16 KB devices, apps that utilize native code, either directly or through libraries or SDKs, might require rebuilding. However, the transition is significantly easier than the previous shift from 32-bit to 64-bit architecture. This article will guide you through the necessary steps to prepare your apps for the upcoming devices. The next generation of devices is on its way, with the first models supporting 16 KB page sizes expected to arrive in a couple of years.

Getting ready for 16 KB: SDK developers

If you develop your own SDKs and libraries, we encourage you to update them to be 16 KB page size compatible and test them on 16 KB devices as soon as possible. This will give app developers ample time to incorporate the necessary changes. Registering with Play SDK Console is a great way to ensure you receive advanced notices like these in the future and in a timely manner.

Getting ready for 16 KB: app developers with no native code

Apps written in and with dependencies entirely in Kotlin or the Java programming languages will work as-is!

Getting ready for 16 KB: app developers with native code

To check if your app has native code, you can utilize tools like APK Analyzer in Android Studio. However, the only way to ensure app compatibility is to test.

Rebuild your app

To ensure your app works on devices with a 16 KB page size, follow these steps:

      1. Upgrade your tools: Start by upgrading to Android Gradle Plugin (AGP) 8.5.1 or higher. These updated tools incorporate the necessary 16 KB page size configuration for your App Bundle and the APKs generated from it using bundletool.

      2. Align your native code: If your app includes native code, use NDK version r28 or higher, or rebuild it with 16 KB page size alignment. You should also ensure that your native code does not rely on or hardcode the value of PAGE_SIZE.

      3. Update SDKs and libraries: Confirm that all SDKs and libraries used in your app are compatible with 16 KB page size. If necessary, contact the SDK or library developers for updated versions.

Test your app in 16 KB mode

To make sure your application does not assume the page size to be 4 KB anywhere, test it with a 16 KB page size emulator or virtual device in addition to how you have been testing (with a 4 KB page size). This helps identify and resolve any compatibility issues from the move to 16 KB page sizes. You can also test on physical devices with the developer option available on Pixel 8, 8a, and 8 Pro starting with the Android 15 QPR1 and Pixel 9, 9 Pro, 9 Pro XL in the Android 15 QPR2 Beta 2, with more devices on the way.

The Future is Faster and More Efficient

The move to 16 KB page size benefits the Android ecosystem. It unlocks performance improvements, paves the way for future innovations, and provides users with smoother and richer app experiences.

We'll continue to provide updates and resources to help you through this transition. Start preparing your apps today to ensure you're ready for the future of Android!

Tools, not Rules: become a better Android developer with Compiler Explorer

Posted by Shai Barack – Android Platform Performance lead

Introducing Android support in Compiler Explorer

In a previous blog post you learned how Android engineers continuously improve the Android Runtime (ART) in ways that boost app performance on user devices. These changes to the compiler make system and app code faster or smaller. Developers don’t need to change their code and rebuild their apps to benefit from new optimizations, and users get a better experience. In this blog post I’ll take you inside the compiler with a tool called Compiler Explorer and witness some of these optimizations in action.

Compiler Explorer is an interactive website for studying how compilers work. It is an open source project that anyone can contribute to. This year, our engineers added support to Compiler Explorer for the Java and Kotlin programming languages on Android.

You can use Compiler Explorer to understand how your source code is translated to assembly language, and how high-level programming language constructs in a language like Kotlin become low-level instructions that run on the processor.

At Google our engineers use this tool to study different coding patterns for efficiency, to see how existing compiler optimizations work, to share new optimization opportunities, and to teach and learn. Learning is best when it’s done through tools, not rules. Instead of teaching developers to memorize different rules for how to write efficient code or what the compiler might or might not optimize, give the engineers the tools to find out for themselves what happens when they write their code in different ways, and let them experiment and learn. Let’s learn together!

Start by going to godbolt.org. By default we see C++ sample code, so click the dropdown that says C++ and select Android Java. You should see this sample code:

class Square {
   static int square(int num) {
       return num * num;
   }
}
screenshot of sample code in Compiler Explorer
click to enlarge

On the left you’ll see a very simple program. You might say that this is a one line program. But this is not a meaningful statement in terms of performance - how many lines of code there are doesn’t tell us how long this program will take to run, or how much memory will be occupied by the code when the program is loaded.

On the right you’ll see a disassembly of the compiler output. This is expressed in terms of assembly language for the target architecture, where every line is a CPU instruction. Looking at the instructions, we can say that the implementation of the square(int num) method consists of 2 instructions in the target architecture. The number and type of instructions give us a better idea for how fast the program is than the number of lines of source code. Since the target architecture is AArch64 aka ARM64, every instruction is 4 bytes, which means that our program’s code occupies 8 bytes in RAM when the program is compiled and loaded.

Let’s take a brief detour and introduce some Android toolchain concepts.


The Android build toolchain (in brief)

a flow diagram of the Android build toolchain

When you write your Android app, you’re typically writing source code in the Java or Kotlin programming languages. When you build your app in Android Studio, it’s initially compiled by a language-specific compiler into language-agnostic JVM bytecode in a .jar. Then the Android build tools transform the .jar into Dalvik bytecode in .dex files, which is what the Android Runtime executes on Android devices. Typically developers use d8 in their Debug builds, and r8 for optimized Release builds. The .dex files go in the .apk that you push to test devices or upload to an app store. Once the .apk is installed on the user’s device, an on-device compiler which knows the specific target device architecture can convert the bytecode to instructions for the device’s CPU.

We can use Compiler Explorer to learn how all these tools come together, and to experiment with different inputs and see how they affect the outputs.

Going back to our default view for Android Java, on the left is Java source code and on the right is the disassembly for the on-device compiler dex2oat, the very last step in our toolchain diagram. The target architecture is ARM64 as this is the most common CPU architecture in use today by Android devices.

The ARM64 Instruction Set Architecture offers many instructions and extensions, but as you read disassemblies you will find that you only need to memorize a few key instructions. You can look for ARM64 Quick Reference cards online to help you read disassemblies.

At Google we study the output of dex2oat in Compiler Explorer for different reasons, such as:

    • Gaining intuition for what optimizations the compiler performs in order to think about how to write more efficient code.
    • Estimating how much memory will be required when a program with this snippet of code is loaded into memory.
    • Identifying optimization opportunities in the compiler - ways to generate instructions for the same code that are more efficient, resulting in faster execution or in lower memory usage without requiring app developers to change and rebuild their code.
    • Troubleshooting compiler bugs! 🐞

Compiler optimizations demystified

Let’s look at a real example of compiler optimizations in practice. In the previous blog post you can read about compiler optimizations that the ART team recently added, such as coalescing returns. Now you can see the optimization, with Compiler Explorer!

Let’s load this example:

class CoalescingReturnsDemo {
   String intToString(int num) {
       switch (num) {
           case 1:
               return "1";
           case 2:
               return "2";
           case 3:
               return "3";           
           default:
               return "other";
       }
   }
}
screenshot of sample code in Compiler Explorer
click to enlarge

How would a compiler implement this code in CPU instructions? Every case would be a branch target, with a case body that has some unique instructions (such as referencing the specific string) and some common instructions (such as assigning the string reference to a register and returning to the caller). Coalescing returns means that some instructions at the tail of each case body can be shared across all cases. The benefits grow for larger switches, proportional to the number of the cases.

You can see the optimization in action! Simply create two compiler windows, one for dex2oat from the October 2022 release (the last release before the optimization was added), and another for dex2oat from the November 2023 release (the first release after the optimization was added). You should see that before the optimization, the size of the method body for intToString was 124 bytes. After the optimization, it’s down to just 76 bytes.

This is of course a contrived example for simplicity’s sake. But this pattern is very common in Android code. For instance consider an implementation of Handler.handleMessage(Message), where you might implement a switch statement over the value of Message#what.

How does the compiler implement optimizations such as this? Compiler Explorer lets us look inside the compiler’s pipeline of optimization passes. In a compiler window, click Add New > Opt Pipeline. A new window will open, showing the High-level Internal Representation (HIR) that the compiler uses for the program, and how it’s transformed at every step.

screenshot of the high-level internal representation (HIR) the compiler uses for the program in Compiler Explorer
click to enlarge

If you look at the code_sinking pass you will see that the November 2023 compiler replaces Return HIR instructions with Goto instructions.

Most of the passes are hidden when Filters > Hide Inconsequential Passes is checked. You can uncheck this option and see all optimization passes, including ones that did not change the HIR (i.e. have no “diff” over the HIR).

Let’s study another simple optimization, and look inside the optimization pipeline to see it in action. Consider this code:

class ConstantFoldingDemo {
   static int demo(int num) {
       int result = num;
       if (num == 2) {
           result = num + 2;
       }
       return result;
   }
}

The above is functionally equivalent to the below:

class ConstantFoldingDemo {
   static int demo(int num) {
       int result = num;
       if (num == 2) {
           result = 4;
       }
       return result;
   }
}

Can the compiler make this optimization for us? Let’s load it in Compiler Explorer and turn to the Opt Pipeline Viewer for answers.

screenshot of Opt Pipeline Viewer in Compiler Explorer
click to enlarge

The disassembly shows us that the compiler never bothers with “two plus two”, it knows that if num is 2 then result needs to be 4. This optimization is called constant folding. Inside the conditional block where we know that num == 2 we propagate the constant 2 into the symbolic name num, then fold num + 2 into the constant 4.

You can see this optimization happening over the compiler’s IR by selecting the constant_folding pass in the Opt Pipeline Viewer.

Kotlin and Java, side by side

Now that we’ve seen the instructions for Java code, try changing the language to Android Kotlin. You should see this sample code, the Kotlin equivalent of the basic Java sample we’ve seen before:

fun square(num: Int): Int = num * num
screenshot of sample code in Kotlin in Compiler Explorer
click to enlarge

You will notice that the source code is different but the sample program is functionally identical, and so is the output from dex2oat. Finding the square of a number results in the same instructions, whether you write your source code in Java or in Kotlin.

You can take this opportunity to study interesting language features and discover how they work. For instance, let’s compare Java String concatenation with Kotlin String interpolation.

In Java, you might write your code as follows:

class StringConcatenationDemo {
   void stringConcatenationDemo(String myVal) {
       System.out.println("The value of myVal is " + myVal);
   }
}

Let’s find out how Java String concatenation actually works by trying this example in Compiler Explorer.

screenshot of sample code in Kotlin in Compiler Explorer
click to enlarge

First you will notice that we changed the output compiler from dex2oat to d8. Reading Dalvik bytecode, which is the output from d8, is usually easier than reading the ARM64 instructions that dex2oat outputs. This is because Dalvik bytecode uses higher level concepts. Indeed you can see the names of types and methods from the source code on the left side reflected in the bytecode on the right side. Try changing the compiler to dex2oat and back to see the difference.

As you read the d8 output you may realize that Java String concatenation is actually implemented by rewriting your source code to use a StringBuilder. The source code above is rewritten internally by the Java compiler as follows:

class StringConcatenationDemo {
   void stringConcatenationDemo(String myVal) {
       StringBuilder sb = new StringBuilder();
       sb.append("The value of myVal is ");
       sb.append(myVal);
       System.out.println(sb.toString());
  }
}

In Kotlin, we can use String interpolation:

fun stringInterpolationDemo(myVal: String) {
   System.out.println("The value of myVal is $myVal");
}

The Kotlin syntax is easier to read and write, but does this convenience come at a cost? If you try this example in Compiler Explorer, you may find that the Dalvik bytecode output is roughly the same! In this case we see that Kotlin offers an improved syntax, while the compiler emits similar bytecode.

At Google we study examples of language features in Compiler Explorer to learn about how high-level language features are implemented in lower-level terms, and to better inform ourselves on the different tradeoffs that we might make in choosing whether and how to adopt these language features. Recall our learning principle: tools, not rules. Rather than memorizing rules for how you should write your code, use the tools that will help you understand the upsides and downsides of different alternatives, and then make an informed decision.

What happens when you minify your app?

Speaking of making informed decisions as an app developer, you should be minifying your apps with R8 when building your Release APK. Minifying generally does three things to optimize your app to make it smaller and faster:

      1. Dead code elimination: find all the live code (code that is reachable from well-known program entry points), which tells us that the remaining code is not used, and therefore can be removed.

      2. Bytecode optimization: various specialized optimizations that rewrite your app’s bytecode to make it functionally identical but faster and/or smaller.

      3. Obfuscation: renaming all types, methods, and fields in your program that are not accessed by reflection (and therefore can be safely renamed) from their names in source code (com.example.MyVeryLongFooFactorySingleton) to shorter names that fit in less memory (a.b.c).

Let’s see an example of all three benefits! Start by loading this view in Compiler Explorer.

screenshot of sample code in Kotlin in Compiler Explorer
click to enlarge

First you will notice that we are referencing types from the Android SDK. You can do this in Compiler Explorer by clicking Libraries and adding Android API stubs.

Second, you will notice that this view has multiple source files open. The Kotlin source code is in example.kt, but there is another file called proguard.cfg.

-keep class MinifyDemo {
   public void goToSite(...);
}

Looking inside this file, you’ll see directives in the format of Proguard configuration flags, which is the legacy format for configuring what to keep when minifying your app. You can see that we are asking to keep a certain method of MinifyDemo. “Keeping” in this context means don’t shrink (we tell the minifier that this code is live). Let’s say we’re developing a library and we’d like to offer our customer a prebuilt .jar where they can call this method, so we’re keeping this as part of our API contract.

We set up a view that will let us see the benefits of minifying. On one side you’ll see d8, showing the dex code without minification, and on the other side r8, showing the dex code with minification. By comparing the two outputs, we can see minification in action:

      1. Dead code elimination: R8 removed all the logging code, since it never executes (as DEBUG is always false). We removed not just the calls to android.util.Log, but also the associated strings.

      2. Bytecode optimization: since the specialized methods goToGodbolt, goToAndroidDevelopers, and goToGoogleIo just call goToUrl with a hardcoded parameter, R8 inlined the calls to goToUrl into the call sites in goToSite. This inlining saves us the overhead of defining a method, invoking the method, and returning from the method.

      3. Obfuscation: we told R8 to keep the public method goToSite, and it did. R8 also decided to keep the method goToUrl as it’s used by goToSite, but you’ll notice that R8 renamed that method to a. This method’s name is an internal implementation detail, so obfuscating its name saved us a few precious bytes.

You can use R8 in Compiler Explorer to understand how minification affects your app, and to experiment with different ways to configure R8.

At Google our engineers use R8 in Compiler Explorer to study how minification works on small samples. The authoritative tool for studying how a real app compiles is the APK Analyzer in Android Studio, as optimization is a whole-program problem and a snippet might not capture every nuance. But iterating on release builds of a real app is slow, so studying sample code in Compiler Explorer helps our engineers quickly learn and iterate.

Google engineers build very large apps that are used by billions of people on different devices, so they care deeply about these kinds of optimizations, and strive to make the most use out of optimizing tools. But many of our apps are also very large, and so changing the configuration and rebuilding takes a very long time. Our engineers can now use Compiler Explorer to experiment with minification under different configurations and see results in seconds, not minutes.

You may wonder what would happen if we changed our code to rename goToSite? Unfortunately our build would break, unless we also renamed the reference to that method in the Proguard flags. Fortunately, R8 now natively supports Keep Annotations as an alternative to Proguard flags. We can modify our program to use Keep Annotations:

@UsedByReflection(kind = KeepItemKind.CLASS_AND_METHODS)
public static void goToSite(Context context, String site) {
    ...
}

Here is the complete example. You’ll notice that we removed the proguard.cfg file, and under Libraries we added “R8 keep-annotations”, which is how we’re importing @UsedByReflection.

At Google our engineers prefer annotations over flags. Here we’ve seen one benefit of annotations - keeping the information about the code in one place rather than two makes refactors easier. Another is that the annotations have a self-documenting aspect to them. For instance if this method was kept actually because it’s called from native code, we would annotate it as @UsedByNative instead.

Baseline profiles and you

Lastly, let’s touch on baseline profiles. So far you saw some demos where we looked at dex code, and others where we looked at ARM64 instructions. If you toggle between the different formats you will notice that the high-level dex bytecode is much more compact than low-level CPU instructions. There is an interesting tradeoff to explore here - whether, and when, to compile bytecode to CPU instructions?

For any program method, the Android Runtime has three compilation options:

      1. Compile the method Just in Time (JIT).

      2. Compile the method Ahead of Time (AOT).

      3. Don’t compile the method at all, instead use a bytecode interpreter.

Running code in an interpreter is an order of magnitude slower, but doesn’t incur the cost of loading the representation of the method as CPU instructions which as we’ve seen is more verbose. This is best used for “cold” code - code that runs only once, and is not critical to user interactions.

When ART detects that a method is “hot”, it will be JIT-compiled if it’s not already been AOT compiled. JIT compilation accelerates execution times, but pays the one-time cost of compilation during app runtime. This is where baseline profiles come in. Using baseline profiles, you as the app developer can give ART a hint as to which methods are going to be hot or otherwise worth compiling. ART will use that hint before runtime, compiling the code AOT (usually at install time, or when the device is idle) rather than at runtime. This is why apps that use Baseline Profiles see faster startup times.

With Compiler Explorer we can see Baseline Profiles in action.

Let’s open this example.

screenshot of sample code in Compiler Explorer
click to enlarge

The Java source code has two method definitions, factorial and fibonacci. This example is set up with a manual baseline profile, listed in the file profile.prof.txt. You will notice that the profile only references the factorial method. Consequently, the dex2oat output will only show compiled code for factorial, while fibonacci shows in the output with no instructions and a size of 0 bytes.

In the context of compilation modes, this means that factorial is compiled AOT, and fibonacci will be compiled JIT or interpreted. This is because we applied a different compiler filter in the profile sample. This is reflected in the dex2oat output, which reads: “Compiler filter: speed-profile” (AOT compile only profile code), where previous examples read “Compiler filter: speed” (AOT compile everything).

Conclusion

Compiler Explorer is a great tool for understanding what happens after you write your source code but before it can run on a target device. The tool is easy to use, interactive, and shareable. Compiler Explorer is best used with sample code, but it goes through the same procedures as building a real app, so you can see the impact of all steps in the toolchain.

By learning how to use tools like this to discover how the compiler works under the hood, rather than memorizing a bunch of rules of optimization best practices, you can make more informed decisions.

Now that you've seen how to use the Java and Kotlin programming languages and the Android toolchain in Compiler Explorer, you can level up your Android development skills.

Lastly, don't forget that Compiler Explorer is an open source project on GitHub. If there is a feature you'd like to see then it's just a Pull Request away.


Java and OpenJDK are trademarks or registered trademarks of Oracle and/or its affiliates.

Adding 16 KB Page Size to Android

Posted by Steven Moreland – Staff Software Engineer, Sandeep Patil – Principal Software Engineer

A page is the granularity at which an operating system manages memory. Most CPUs today support a 4 KB page size and so the Android OS and applications have historically been built and optimized to run with a 4 KB page size. ARM CPUs support the larger 16 KB page size. When Android uses this larger page size, we observe an overall performance boost of 5-10% while using ~9% additional memory.

In order to improve the operating system performance overall and to give device manufacturers an option to make this trade-off, Android 15 can run with 4 KB or 16 KB page sizes.

The very first 16 KB enabled Android system will be made available on select devices as a developer option. This is so you can use the developer option to test and fix (if needed) your applications to prepare for Android devices with 16 KB page sizes in the near future.

Details

In most CPUs, dedicated hardware called memory management units (MMUs) translate addresses from what a program is using to a physical location in memory. This translation is done on a page-size basis. Every time a program needs more memory, the operating system needs to get involved and fill out a “page table” entry, assigning that piece of memory to a process. When the page size is 4 times larger, there is 4 times less bookkeeping. So, the system can spend more time making sure your videos look great, games play well, and applications run smoothly, and less time filling out low-level operating system paperwork.

Unlike 32-bit/64-bit mode, a page size is not an Application Binary Interface (ABI). In other words, once an application is fixed to be page size agnostic, the same application binary can run on both 4 KB and 16 KB devices.

In Android 15, we’ve refactored Android from the ground up to support running at different page sizes, thus making it page-size agnostic.

Major OS Changes

On new Android 15 based devices:

    • All OS binaries are 16 KB aligned (-Wl,-z,max-page-size=16384). 3rd party applications / libraries may not be 16 KB aligned.
    • All OS binaries are built with separate loadable segments (-Wl,-z,separate-loadable-segments) to ensure all memory regions mapped into a process are readable, which some applications depend on.

Many of our other OS components have been rewritten to avoid assuming the page size and to optimize for larger page size when available.

Filesystems

For performant operation, file system block size must match the page size. EROFS and F2FS file systems have been made 16 KB compatible, as has the UFS storage layer.

On 4 KB systems, ELF executable file size increases due to additional padding added for 16 KB alignment (-Wl,-z,max-page-size=16384 option), but several optimizations help us avoid this cost.

  1. Sparse read-only file systems ensure that zero pages created for additional padding for 16 KB alignment are not written to disk. For example, EROFS knows a certain range of a file is zero filled, and it will not need to do any IO if this part of the file is accessed.
  2. Read-writeable file systems handle zero pages on a case-by-case basis. For example, In Android 15, for files installed as part of applications PackageManager reclaims this space.

Memory Management

  1. The Linux page cache has been modified not to read ahead for these extra padding spaces, thereby saving unnecessary memory load.
  2. These pages are blank padding, and programs never read this. It’s the space in-between usable parts of the program, purely for alignment reasons.

Linux Kernel

The Linux kernel is deeply tied to a specific page size, so we must choose which page size to use when building the kernel, while the rest of the operating system remains the same.

Android Applications

All applications with native code or dependencies need to be recompiled for compatibility with 16 KB page size devices.

Since most native code within Android applications and SDKs have been built with 4 KB page size in mind, they need to be re-aligned to 16 KB so the binaries are compatible with both 4 KB and 16 KB devices. For most applications and SDKs, this is a 2 step process:

  1. Rebuild the native code with 16 KB alignment.
  2. Test and fix on a 16 KB device/emulator in case there are hardcode assumptions about page size.

Please see our developer documentation for more information.

NOTE: If you are an SDK or tools developer, you should add 16 KB support as soon as possible so that applications can work on 16 KB using your SDK or tools.

Developing for 16 KB devices

There are no production Android devices available today or expected for the Android 15 release that support a 16 KB page size. In order to fix this problem, we are taking steps to work with our partners to make a developer option available on existing devices. This developer option is meant for application development and testing. We are also making a 16 KB emulator target available for developers in Android Studio.

16 KB Developer option on device

In Android 15, we implemented a developer option that lets users switch between 16 KB and 4 KB page size on the device in order to test their application with either of the page sizes. This option is available on Pixel 8 and Pixel 8 Pro starting in the Android 15 QPR1 Beta, and we're collaborating closely with SoC and OEM partners to enable the option on additional devices soon.

screen grab of 16KB developer option on device

When built for 16 KB pages, the same binary will work with 4 KB and 16 KB devices, however the Linux kernel has to be separate. In order to solve this problem, we’ve added a way to include an extra kernel you can switch to as a developer option. Incrementally compressed, with one copy for each page size and takes ~12-16 MB of space on disk.

Using the 16 KB developer option will require wiping the device once and an unlocked bootloader. Following flashing, developers will be able to switch between a 4 KB and 16 KB mode by toggling the developer option over a reboot.

If you are a device manufacturer or SoC developer, see our instructions on how to enable and use this.

16 KB on x86_64 desktops

While 16 KB pages are an ARM-only feature, we recognize that many developers are using emulators on x86_64 hardware. In order to bridge this gap for developers, we’ve added support to emulate 16 KB page size for applications on x86_64 emulators. In this mode, the Kernel runs in 4 KB mode, but all addresses exposed to applications are aligned to 16 KB, and arguments to function calls such as mmap(...MAP_FIXED...) are verified to be 16 KB aligned.

To get started, you can download and run the 16 KB pages emulator inside the Android Studio SDK manager. This way, even if you don’t have access to ARM hardware, you can still ensure your applications will work with 16 KB page size.

16 KB pages emulator inside the Android Studio SDK manager

Future

In this post, we’ve discussed the technical details of how we are restructuring memory in Android to get faster, more performant devices. Android 15 and AOSP work with 16 KB pages, and devices can now implement 16 KB pages as a development option. This required changes from the bottom to the top of the operating system, in our development tooling, and throughout the Android ecosystem.

We are looking forward to application and SDK developers now to take advantage of these options and prepare for more performant and efficient Android devices in near future.

Optimizing gVisor filesystems with Directfs

gVisor is a sandboxing technology that provides a secure environment for running untrusted code. In our previous blog post, we discussed how gVisor performance improves with a root filesystem overlay. In this post, we'll dive into another filesystem optimization that was recently launched: directfs. It gives gVisor’s application kernel (the Sentry) secure direct access to the container filesystem, avoiding expensive round trips to the filesystem gofer.

Origins of the Gofer

gVisor is used internally at Google to run a variety of services and workloads. One of the challenges we faced while building gVisor was providing remote filesystem access securely to the sandbox. gVisor’s strict security model and defense in depth approach assumes that the sandbox may get compromised because it shares the same execution context as the untrusted application. Hence the sandbox cannot be given sensitive keys and credentials to access Google-internal remote filesystems.

To address this challenge, we added a trusted filesystem proxy called a "gofer". The gofer runs outside the sandbox, and provides a secure interface for untrusted containers to access such remote filesystems. For architectural simplicity, gofers were also used to serve local filesystems as well as remote.

Gofer process intermediates filesystem operations

Isolating the Container Filesystem in runsc

When gVisor was open sourced as runsc, the same gofer model was copied over to maintain the same security guarantees. runsc was configured to start one gofer process per container which serves the container filesystem to the sandbox over a predetermined protocol (now LISAFS). However, a gofer adds a layer of indirection with significant overhead.

This gofer model (built for remote filesystems) brings very few advantages for the runsc use-case, where all the filesystems served by the gofer (like rootfs and bind mounts) are mounted locally on the host. The gofer directly accesses them using filesystem syscalls.

Linux provides some security primitives to effectively isolate local filesystems. These include, mount namespaces, pivot_root and detached bind mounts1. Directfs is a new filesystem access mode that uses these primitives to expose the container filesystem to the sandbox in a secure manner. The sandbox’s view of the filesystem tree is limited to just the container filesystem. The sandbox process is not given access to anything mounted on the broader host filesystem. Even if the sandbox gets compromised, these mechanisms provide additional barriers to prevent broader system compromise.

Directfs

In directfs mode, the gofer still exists as a cooperative process outside the sandbox. As usual, the gofer enters a new mount namespace, sets up appropriate bind mounts to create the container filesystem in a new directory and then pivot_root(2)s into that directory. Similarly, the sandbox process enters new user and mount namespaces and then pivot_root(2)s into an empty directory to ensure it cannot access anything via path traversal. But instead of making RPCs to the gofer to access the container filesystem, the sandbox requests the gofer to provide file descriptors to all the mount points via SCM_RIGHTS messages. The sandbox then directly makes file-descriptor-relative syscalls (e.g. fstatat(2), openat(2), mkdirat(2), etc) to perform filesystem operations.

Sandbox directly accesses container filesystem with directfs

Earlier when the gofer performed all filesystem operations, we could deny all these syscalls in the sandbox process using seccomp. But with directfs enabled, the sandbox process's seccomp filters need to allow the usage of these syscalls. Most notably, the sandbox can now make openat(2) syscalls (which allow path traversal), but with certain restrictions: O_NOFOLLOW is required, no access to procfs and no directory FDs from the host. We also had to give the sandbox the same privileges as the gofer (for example CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH), so it can perform the same filesystem operations.

It is noteworthy that only the trusted gofer provides FDs (of the container filesystem) to the sandbox. The sandbox cannot walk backwards (using ‘..’) or follow a malicious symlink to escape out of the container filesystem. In effect, we've decreased our dependence on the syscall filters to catch bad behavior, but correspondingly increased our dependence on Linux's filesystem isolation protections.

Performance

Making RPCs to the gofer for every filesystem operation adds a lot of overhead to runsc. Hence, avoiding gofer round trips significantly improves performance. Let's find out what this means for some of our benchmarks. We will run the benchmarks using our newly released systrap platform on bind mounts (as opposed to rootfs). This would simulate more realistic use cases because bind mounts are extensively used while configuring filesystems in containers. Bind mounts also do not have an overlay (like the rootfs mount), so all operations go through goferfs / directfs mount.

Let's first look at our stat micro-benchmark, which repeatedly calls stat(2) on a file.

Stat benchmark improvement with directfs
The stat(2) syscall is more than 2x faster! However, since this is not representative of real-world applications, we should not extrapolate these results. So let's look at some real-world benchmarks.
Stat benchmark improvement with directfs
We see a 12% reduction in the absolute time to run these workloads and 17% reduction in Ruby load time!

Conclusion

The gofer model in runsc was overly restrictive for accessing host files. We were able to leverage existing filesystem isolation mechanisms in Linux to bypass the gofer without compromising security. Directfs significantly improves performance for certain workloads. This is part of our ongoing efforts to improve gVisor performance. You can learn more about gVisor at gvisor.dev. You can also use gVisor in GKE with GKE Sandbox. Happy sandboxing!


1Detached bind mounts can be created by first creating a bind mount using mount(MS_BIND) and then detaching it from the filesystem tree using umount(MNT_DETACH).


By Ayush Ranjan, Software Engineer – Google

Google I/O 2023: What’s new in Jetpack

Posted by Amanda Alexander, Product Manager, Android

Android Jetpack is a key pillar of Modern Android Development. It is a suite of over 100 libraries, tools and guidance to help developers follow best practices, reduce boilerplate code, and write code that works consistently across Android versions and devices so that you can focus on building unique features for your app. The majority of apps on Google Play rely on Jetpack, in fact over 90% of the top 1000 apps use Jetpack.

Below we’ll cover highlights of recent updates in three major areas of Jetpack:

  • Architecture Libraries and Guidance
  • Performance Optimization of Applications
  • User Interface Libraries and Guidance

And then conclude with some additional key updates.


1. Architecture Libraries and Guidance

App architecture libraries and components ensure that apps are robust, testable, and maintainable.

Data Persistence

Most applications need to persist local state - whether it be caching results, managing local lists of user enter data, or powering data returned in the UI. Room is the recommended data persistence layer which provides an abstraction layer over SQLite, allowing for increased usability and safety over the platform.

In Room, we have added many brand-new features, such as the Upsert operation, which attempts to insert an entity when there is no uniqueness conflict or update the entity if there is a conflict, and support for using Kotlin value classes for KSP. These new features are available in Room 2.6-alpha with all library sources written in Kotlin and supports both the Java programming language and Kotlin code generation.

Managing tasks with WorkManager

The WorkManager library makes it easy to schedule deferrable, asynchronous tasks that must be run reliably for instance uploading backups or analytics. These APIs let you create a task and hand it off to WorkManager to run when the work constraints are met.

Now, WorkManager allows you to update a WorkRequest after you have already enqueued it. This is often necessary in larger apps that frequently change constraints or need to update their workers on the fly. As of WorkManager 2.8.0, the updateWork() API is the means of doing this without having to go through the process of manually canceling and enqueuing a new WorkRequest. This greatly simplifies the development process.

DataStore

The DataStore library is a robust data storage solution that addresses issues with SharedPreferences and provides a modern coroutines based API.

In DataStore 1.1 alpha we added a widely requested feature: multi-process support which allows you to access the DataStore from multiple processes while providing data consistency guarantees between them. Additional features include a new storage interface that enables the underlying storage mechanism for Datastore to be switched out (we have provided implementations for java.io and okio), and we have also added support for Kotlin Multiplatform.

Lifecycle management

Lifecycle-aware components perform actions in response to a change in the lifecycle status of another component, such as activities and fragments. These components help you produce better-organized, and often lighter-weight code, that is easier to maintain.

We released a stable version of Lifecycle 2.6.0 that includes more Compose integration. We added a new extension method on Flow, collectAsStateWithLifecycle(), that collects from flows and represents its latest value as Compose State in a lifecycle-aware manner. Additionally, a large number of classes are converted to Kotlin and still retain their binary compatibility with previous versions.

Predictive Back Gesture

moving image illustrating predictive back texture

In Android 13, we introduced a predictive back gesture for Android devices such as phones, large screens, and foldables. It is part of a multi-year release; when fully implemented, this feature will let users preview the destination or other result of a back gesture before fully completing it, allowing them to decide whether to continue or stay in the current view.

The Activity APIs for Predictive Back for Android are stable and we have updated the best practices for using the supported system back callbacks; BackHandler (for Compose), OnBackPressedCallback, or OnBackInvokedCallback. We are excited to see Google apps adopt Predictive Back including PlayStore, Calendar, News, and TV!

In the Activity 1.8 alpha releases, The OnBackPressedCallback class now contains new Predictive Back progress callbacks for handling the back gesture starting, progress throughout the gesture, and the back gesture being canceled in addition to the previous handleOnBackPressed() callback for when the back gesture is committed. We also added ComponentActivity.setUpEdgeToEdge() to easily set up the edge-to-edge display in a backward-compatible manner.

Activity updates for more consistent Photo Picker experience

The Android photo picker is a browsable interface that presents the user with their media library. In Activity 1.7.0, the Photo Picker activity contracts have been updated to contain an additional fallback that allows OEMs and system apps, such as Google Play services, to provide a consistent Photo Picker experience on a wider range of Android devices and API levels by implementing the fallback action. Read more in the Photo Picker Everywhere blog.

Incremental Data Fetching

The Paging library allows you to load and display small chunks of data to improve network and system resource consumption. App data can be loaded gradually and gracefully within RecyclerViews or Compose lazy lists.

In Paging Compose 1.0.0-alpha19, there is support for all lazy layouts including custom layouts provided by the Wear and TV libraries. To support more lazy layouts, Paging Compose now provides slightly lower level extension methods on LazyPagingItems in itemKey and itemContentType. These APIs focus on helping you implement the key and contentType parameters to the standard items APIs that already exist for LazyColumnLazyVerticalGrid as well as their equivalents in APIs like HorizontalPager. While these changes do make the LazyColumn and LazyRow examples a few lines longer, it provides consistency across all lazy layouts.


2. Performance Optimization of Applications

Using performance libraries allows you to build performant apps and identify optimizations to maintain high performance, resulting in better end-user experiences.

Improving Start-up Times

Baseline Profiles allow you to partially compile your app at install time to improve runtime and launch performance, and are getting big improvements in new tooling and libraries:

Jetpack provides a new Baseline Profile Gradle Plugin in alpha, which supports AGP 8.0+, and can be easily added to your project in Studio Hedgehog (now in canary). The plugin lets you automate the task of running generation tasks, and pulling profiles from the device and integrating them into your build either periodically, or as part of your release process.

The plugin also allows you to easily automate the new Dex Layout Optimization feature in AGP 8.1, which lets you define BaselineProfileRule tests that collect classes used during startup, and move them to the primary dex file in a multidex app to increase locality. In a large app, this can improve cold startup time by 30% on top of Baseline Profiles!

Macrobenchmark 1.2 has shipped a lot of new features in alpha, such as Power metrics and Custom trace metrics, generation of Baseline Profiles without root on Android 13, and recompilation without clearing app data on Android 14.

You can read everything in depth in the blog "What's new in Android Performance".


3. User Interface Libraries and Guidance

Several changes have been made to our UI libraries to provide better support for large-screen compatibility, foldables, and emojis.

Jetpack Compose

Jetpack Compose, Android’s modern toolkit for building native UI, recently had its May 2023 release which includes new features for text and layouts, continued performance improvements, enhanced tooling support, increased support for large screens, and updated guidance. You can read more in the What’s New in Jetpack Compose I/O blog to learn more.

Glance

The Glance library, now in 1.0-beta, lets you develop app widgets optimized for Android phone, tablet, and foldable homescreens using Jetpack Compose. The library gives you the latest Android widget improvements out of the box, using Kotlin and Compose.

Compose for TV

With the alpha release of the TV library, you can now build experiences for Android TV using components optimized for the living room experience. Compose for TV unlocks all the benefits of Jetpack Compose for your TV apps, allowing you to build apps with less code, easier maintenance and a modern Material 3 look straight out of the box. See the Compose for TV blog for details.

Material 3 for Compose

Material Design 3 is the next evolution of Material Design, enabling you to build expressive, spirited and personal apps. It is the recommended design system for Android apps and the 1.1 stable release brings exciting new features such as bottom sheets, date and time pickers, search bars, tooltips, and added more motion and interaction support. Read more in the release blog.

Understanding Window State

The new WindowManager library helps developers adapt their apps to support multi-window environments and new device form factors by providing a common API surface with support back to API level 14.

In 1.1.0-beta01, new features and capabilities have been added to activity embedding and window layout that enables you to optimize your multi-activity apps for large screens. With the 1.1 release of Jetpack WindowManager, activity embedding APIs are no longer experimental and are recommended for multi-activity applications to provide improved large screen layouts. Check out the What’s new in WindowManager 1.1.0-beta01 blog for details and migration steps.


Other key updates

Kotlin Multiplatform

We continue to experiment with using Kotlin Multiplatform to share business logic between Android and iOS. The Collections 1.3.0-alpha03 and DataStore 1.1.0-alpha02 have been updated so you can now use these libraries in KMM projects. If you are using Kotlin Multiplatform in your app, we would like your feedback!

This was a look at all the changes in Jetpack over the past few months to help you build apps more productively. For more details on each Jetpack library, check out the AndroidX release notes, quickly find relevant libraries with the API picker and watch the Google I/O talks for additional highlights.

Java is a trademark or registered trademark of Oracle and/or its affiliates.

gVisor improves performance with root filesystem overlay

Overview

Container technology is an integral part of modern application ecosystems, making container security an increasingly important topic. Since containers are often used to run untrusted, potentially malicious code it is imperative to secure the host machine from the container.

A container's security depends on its security boundaries, such as user namespaces (which isolate security-related identifiers and attributes), seccomp rules (which restrict the syscalls available), and Linux Security Module configuration. Popular container management products like Docker and Kubernetes relax these and other security boundaries to increase usability, which means that users need additional container security tools to provide a much stronger isolation boundary between the container and the host.

The gVisor open source project, developed by Google, provides an OCI compatible container runtime called runsc. It is used in production at Google to run untrusted workloads securely. Runsc (run sandbox container) is compatible with Docker and Kubernetes and runs containers in a gVisor sandbox. gVisor sandbox has an application kernel, written in Golang, that implements a substantial portion of the Linux system call interface. All application syscalls are intercepted by the sandbox and handled in the user space kernel.

Although gVisor does not introduce large fixed overheads, sandboxing does add some performance overhead to certain workloads. gVisor has made several improvements recently that help containerized applications run faster inside the sandbox, including an improvement to the container root filesystem, which we will dive deeper into.

Costly Filesystem Access in gVisor

gVisor uses a trusted filesystem proxy process (“gofer”) to access the filesystem on behalf of the sandbox. The sandbox process is considered untrusted in gVisor’s security model. As a result, it is not given direct access to the container filesystem and its seccomp filters do not allow filesystem syscalls.

In gVisor, the container rootfs and bind mounts are configured to be served by a gofer.

Gofer mounts configuration in gVisor

When the container needs to perform a filesystem operation, it makes an RPC to the gofer which makes host system calls and services the RPC. This is quite expensive due to:

  1. RPC cost: This is the cost of communicating with the gofer process, including process scheduling, message serialization and IPC system calls.
    • To ameliorate this, gVisor recently developed a purpose-built protocol called LISAFS which is much more efficient than its predecessor.
    • gVisor is also experimenting with giving the sandbox direct access to the container filesystem in a secure manner. This would essentially nullify RPC costs as it avoids the gofer being in the critical path of filesystem operations.
  2. Syscall cost: This is the cost of making the host syscall which actually accesses/modifies the container filesystem. Syscalls are expensive, because they perform context switches into the kernel and back into userspace.
    • To help with this, gVisor heavily caches the filesystem tree in memory. So operations like stat(2) on cached files are serviced quickly. But other operations like mkdir(2) or rename(2) still need to make host syscalls.

Container Root Filesystem

In Docker and Kubernetes, the container’s root filesystem (rootfs) is based on the filesystem packaged with the image. The image’s filesystem is immutable. Any change a container makes to the rootfs is stored separately and is destroyed with the container. This way, the image’s filesystem can be shared efficiently with all containers running the same image. This is different from bind mounts, which allow containers to access the bound host filesystem tree. Changes to bind mounts are always propagated to the host and persist after the container exits.

Docker and Kubernetes both use the overlay filesystem by default to configure container rootfs. Overlayfs mounts are composed of one upper layer and multiple lower layers. The overlay filesystem presents a merged view of all these filesystem layers at its mount location and ensures that lower layers are read-only while all changes are held in the upper layer. The lower layer(s) constitute the “image layer” and the upper layer is the “container layer”. When the container is destroyed, the upper layer mount is destroyed as well, discarding the root filesystem changes the container may have made. Docker’s overlayfs driver documentation has a good explanation.

Rootfs Configuration Before

Let’s consider an example where the image has files foo and baz. The container overwrites foo and creates a new file bar. The diagram below shows how the root filesystem used to be configured in gVisor earlier. We used to go through the gofer and access/mutate the overlaid directory on the host. It also shows the state of the host overlay filesystem.

Rootfs configuration in gVisor earlier

Opportunity! Sandbox Internal Overlay

Given that the upper layer is destroyed with the container and that it is expensive to access/mutate a host filesystem from the sandbox, why keep the upper layer on the host at all? Instead we can move the upper layer into the sandbox.

The idea is to overlay the rootfs using a sandbox-internal overlay mount. We can use a tmpfs upper (container) layer and a read-only lower layer served by the gofer client. Any changes to rootfs would be held in tmpfs (in-memory). Accessing/mutating the upper layer would not require any gofer RPCs or syscalls to the host. This really speeds up filesystem operations on the upper layer, which contains newly created or copied-up files and directories.

Using the same example as above, the following diagram shows what the rootfs configuration would look like using a sandbox-internal overlay.

Rootfs configuration in gVisor with internal overlay

Host-Backed Overlay

The tmpfs mount by default will use the sandbox process’s memory to back all the file data in the mount. This can cause sandbox memory usage to blow up and exhaust the container’s memory limits, so it’s important to store all file data from tmpfs upper layer on disk. We need to have a tmpfs-backing “filestore” on the host filesystem. Using the example from above, this filestore on the host will store file data for foo and bar.

This would essentially flatten all regular files in tmpfs into one host file. The sandbox can mmap(2) the filestore into its address space. This allows it to access and mutate the filestore very efficiently, without incurring gofer RPCs or syscalls overheads.

Self-Backed Overlay

In Kubernetes, you can set local ephemeral storage limits. The upper layer of the rootfs overlay (writeable container layer) on the host contributes towards this limit. The kubelet enforces this limit by traversing the entire upper layerstat(2)-ing all files and summing up their stat.st_blocks*block_size. If we move the upper layer into the sandbox, then the host upper layer is empty and the kubelet will not be able to enforce these limits.

To address this issue, we introduced “self-backed” overlays, which create the filestore in the host upper layer. This way, when the kubelet scans the host upper layer, the filestore will be detected and its stat.st_blocks should be representative of the total file usage in the sandbox-internal upper layer. It is also important to hide this filestore from the containerized application to avoid confusing it. We do so by creating a whiteout in the sandbox-internal upper layer, which blocks this file from appearing in the merged directory.

The following diagram shows what rootfs configuration would finally look like today in gVisor.

Rootfs configuration in gVisor with self-backed internal overlay

Performance Gains

Let’s look at some filesystem-intensive workloads to see how rootfs overlay impacts performance. These benchmarks were run on a gLinux desktop with KVM platform.

Micro Benchmark

Linux Test Project provides a fsstress binary. This program performs a large number of filesystem operations concurrently, creating and modifying a large filesystem tree of all sorts of files. We ran this program on the container's root filesystem. The exact usage was:

sh -c "mkdir /test && time fsstress -d /test -n 500 -p 20 -s 1680153482 -X -l 10"

You can use the -v flag (verbose mode) to see what filesystem operations are being performed.

The results were astounding! Rootfs overlay reduced the time to run this fsstress program from 262.79 seconds to 3.18 seconds! However, note that such microbenchmarks are not representative of real-world applications and we should not extrapolate these results to real-world performance.

Real-world Benchmark

Build jobs are very filesystem intensive workloads. They read a lot of source files, compile and write out binaries and object files. Let’s consider building the abseil-cpp project with bazel. Bazel performs a lot of filesystem operations in rootfs; in bazel’s cache located at ~/.cache/bazel/.

This is representative of the real-world because many other applications also use the container root filesystem as scratch space due to the handy property that it disappears on container exit. To make this more realistic, the abseil-cpp repo was attached to the container using a bind mount, which does not have an overlay.

When measuring performance, we care about reducing the sandboxing overhead and bringing gVisor performance as close as possible to unsandboxed performance. Sandboxing overhead can be calculated using the formula overhead = (s-n)/n where ‘s’ is the amount of time taken to run a workload inside gVisor sandbox and ‘n’ is the time taken to run the same workload natively (unsandboxed). The following graph shows that rootfs overlay halved the sandboxing overhead for abseil build!

The impact of rootfs overlay on sandboxing overhead for abseil build

Conclusion

Rootfs overlay in gVisor substantially improves performance for many filesystem-intensive workloads, so that developers no longer have to make large tradeoffs between performance and security. We recently made this optimization the default in runsc. This is part of our ongoing efforts to improve gVisor performance. You can learn more about gVisor at gvisor.dev. You can also use gVisor in GKE with GKE Sandbox. Happy sandboxing!

Evolution of Crash Management: Behind the Scenes with App Quality Insights

Posted by Rebecca Gutteridge, Senior Developer Relations Engineer

Hey there! I’m Rebecca Gutteridge, Senior Developer Relations Engineer at Google. As someone who has been working closely with developers to understand how we can make the Android platform better, I’m passionate about helping developers improve their app quality to create amazing experiences for users. In 2022 we announced Android Studio’s App Quality Insights (AQI) window which enables developers to discover, investigate, and reproduce issues reported by Firebase Crashlytics, directly within the context of your local Android Studio project. This is a big step in how Android developers can improve their app stability, and I wanted to learn more about the evolution of how mobile developers have managed crashes throughout the years. You can watch the behind the Scenes video on AQI here, and within the latest episode of #TheAndroidShow.


Early Days of Crash Management

I first chatted with Annyce Davis, VP of Engineering at Meetup and Android GDE. She has been in the mobile development space since 2010 and had a lot of hands on experience helping debug user experiences.

“In the early days, developers cared deeply about user crashes, but they didn’t have the tools to replicate or debug the issue, or to understand which users were being impacted. I remember spending lots of time trying to reproduce issues based on minimal information from bug reports.

One time I remember attempting to debug an experience only happening in a specific country, and no matter how many times I tried, I was unable to reproduce it. It wasn’t until I traveled there in person, I realized people were often using 2G. It never dawned on me to check the connection type!” -Annyce Davis

moving image of Annyce Davis, VP of Engineering at Meetup and Android GDE during the App Quality Insights segment of #TheAndroidShow


Firebase Crashlytics Changes the Game

Crashlytics was introduced in 2011 and it has helped developers track, prioritize, and fix app crashes faster. Annyce told me this was a game changer for crash management.

Moving image of text reads 'Crashlytics helps developers track, prioritize, and fix crashes faster'

“We could now know which devices were experiencing issues, could be notified of trending issues, and finally we were able to show non-technical stakeholders crashes visually, to create buy-in for urgent work.

My team received crash reports for a particular screen of the Meetup app, but we could never reproduce the issue given how inconsistent it was. First, Crashlytics helped us narrow down which feature to examine. We found a crash that was due to a null pointer exception on data that we never expected to be null, so it didn’t seem like the crash could even be possible! An engineer on my team was able to use this data from Crashlytics to uncover that the source was a race condition that would lead to the null, and then he was able to fix it.” -Annyce Davis

What a tricky bug, how fascinating!

Behind the Scenes of AQI

I wanted to learn more about the idea behind AQI, so I chatted with David Motsonashvili, a software engineer on the Firebase team who worked on the initial prototype.

“The original idea for the integration came from a quarterly Hackweek, where we were able to experiment on our own projects. We know Android developers use both Firebase console and Android Studio, so I had an idea to integrate Firebase into Android Studio to reduce their need to switch between the two.

The first prototype for this project was actually an integration with Firebase Performance Monitoring and Android Studio, but we realized Crashlytics would have a much bigger impact on developer workflow as an integration in Android Studio, so we pivoted in that direction instead, and the rest is history!” -David Motsonashvili

Moving stylized image of Android and Firebase logos

I loved that the idea came from wanting to help developers and make our tools easier for them to use! I asked David if he had any fun stories about the project.

“We had to be really scrappy about showing our test app's Crashlytics crash data in the IDE because of limitations we had with the API. It was a really fun project to figure out how to work around this during Hackweek!” -David Motsonashvili

I wanted to better understand how AQI evolved from being an idea during Hackweek, to where it is today.

“Once we launched the early developer preview we tested this with a few internal Google teams, and they loved it! We also started testing this with Android developers as part of an early access program. Some of the companies we talked to were Adobe, Luno, and Meetup. They had really valuable feedback that directly contributed to the roadmap. One example is when we learned many teams needed a place to collaborate within AQI, so we of course moved forward with adding the Crashlytics notes feature into AQI.” -David Motsonashvili

Moving image of quote text reads 'Directly solves one of our big pain points - Adobe Acrobat Reader' and 'Helps keep my finger on the pulse and resolve issues quickly [...] without leaving Android Studio - Maia Grotepass, Luno'


Modern Crash Management

Annyce and her team were early testers of AQI, and it was fun to learn about what they thought of the feature.

“I was truly happy to be able to go directly from a link in the stacktrace to the code. It was the feature in Android Studio that you never knew you needed! I especially like that you can filter issues based on the different variants in your app. Every engineer that I know and work with is passionate about delivering performant, quality code. App Quality Insights is the next step in the evolution of crash management, it can help engineers have more agency over addressing crashes while they also work on exciting new features.” -Annyce Davis

We’ve certainly come a long way with the tools developers have to manage bugs and crashes.

moving image of Annyce Davis, VP of Engineering at Meetup and Android GDE during the App Quality Insights segment of #TheAndroidShow with quote text reads 'It was the feature in Android Studio that you never knew you needed'


Get started with AQI

If you’re ready to try AQI out for yourself, download the latest version of Android Studio. You can also view the documentation, guide on medium, and our demo video to learn more about how to use it.

Introducing Service Weaver: A Framework for Writing Distributed Applications

We are excited to introduce Service Weaver, an open source framework for building and deploying distributed applications. Service Weaver allows you to write your application as a modular monolith and deploy it as a set of microservices.

More concretely, Service Weaver consists of two core pieces:

  1. A set of programming libraries, which let you write your application as a single modular binary, using only native data structures and method calls, and
  2. A set of deployers, which let you configure the runtime topology of your application and deploy it as a set of microservices, either locally or on the cloud of your choosing.
  3. Flow chart of Service Weaver Programming Libraries from development to execution, moving four modules labeled A through D from application across a level of microservices to deployers labeled Desktop, Google Cloud, and Other Cloud
By decoupling the process of writing the application from runtime considerations such as how the application is split into microservices, what data serialization formats are used, and how services are discovered, Service Weaver aims to improve distributed application development velocity and performance.

Motivation for Building Service Weaver

While writing microservices-based applications, we found that the overhead of maintaining multiple different microservice binaries—with their own configuration files, network endpoints, and serializable data formats—significantly slowed our development velocity.

More importantly, microservices severely impacted our ability to make cross-binary changes. It made us do things like flag-gate new features in each binary, evolve our data formats carefully, and maintain intimate knowledge of our rollout processes. Finally, having a predetermined number of specific microservices effectively froze our APIs; they became so difficult to change that it was easier to squeeze all of our changes into the existing APIs rather than evolve them.

As a result, we wished we had a single monolithic binary to work with. Monolithic binaries are easy to write: they use only language-native types and method calls. They are also easy to update: just edit the source code and re-deploy. They are easy to run locally or in a VM: simply execute the binary.

Service Weaver, is a framework that has the best of both worlds: the development velocity of a monolith, with the scalability, security, and fault-tolerance of microservices.

Service Weaver Overview

The core idea of Service Weaver is its modular monolith model. You write a single binary, using only language-native data structures and method calls. You organize your binary as a set of modules, called components, which are native types in the programming language. For example, here is a simple application written in Go using Service Weaver. It consists of a main() function and a single Adder component:
type Adder interface { Add(context.Context, int, int) (int, error) } type adder struct{ weaver.Implements[Adder] } func (adder) Add(_ context.Context, x, y int) (int, error) { return x + y, nil} func main() { ctx := context.Background() root := weaver.Init(ctx) adder, err := weaver.Get[Adder](root) sum, err := adder.Add(ctx, 1, 2) }
When running the above application, you can make a trivial configuration choice of whether to place the Adder component together with the main() function or to place it separately. When the Adder component is separate, the Service Weaver framework automatically translates the Add call into a cross-machine RPC; otherwise, the Add call remains a local method call.

To make a change to the above application, such as adding an unbounded number of arguments to the Add method, all you have to do is change the signature of Add, change its call-sites, and re-deploy your application. Service Weaver makes sure that the new version of main() communicates only with the new version of Adder, regardless of whether they are co-located or not. This behavior, combined with using language-native data structures and method calls, allows you to focus exclusively on writing your application logic, without worrying about the deployment topology and inter-service communication (e.g., there are no protos, stubs, or RPC channels in the code).

When it is time to run your application, Service Weaver allows you to run it anywhere—on your local desktop environment or on your local rack of machines or in the cloud—without any changes to your application code. This level of portability is achieved by a clear separation of concerns built into the Service Weaver framework. On one end, we have the programming framework, used for application development. On the other end, we have various deployer implementations, one per deployment environment.
Flow chart depicting Service Weaver Libraries deployer implementations across three separate platforms in one single iteration

This separation of concerns allows you to run your application locally in a single process via go run .; or run it on Google Cloud via weaver gke deploy; or enable and run it on other platforms. In all of these cases, you get the same application behavior without the need to modify or re-compile your application.

What’s in Service Weaver v0.1?

The v0.1 release of Service Weaver includes:

  • The core Go libraries used for writing your applications.
  • A number of deployers used for running your applications locally or on GKE.
  • A set of APIs that allow you to write your own deployers for any other platform.

All of the libraries are released under the Apache 2.0 license. Please be aware that we are likely to introduce breaking changes until version v1.0 is released.

Get Started and Get Involved

While Service Weaver is still in an early development stage, we would like to invite you to use it and share your feedback, thoughts, and contributions.

The easiest way to get started using Service Weaver is to follow the Step-By-Step instructions on our website. If you would like to contribute, please follow our contributor guidelines. To post a question or contact the team directly, use the Service Weaver mailing list.

The team is excited to host a Twitter Space with Kelsey Hightower on March 2nd, at 10am PST. Keep an eye out on the Service Weaver blog for the latest news, updates, and details on future events.

More Resources

  • Visit us at serviceweaver.dev to get the latest information about the project, such as getting started, tutorials, and blog posts.
  • Access one of our Service Weaver repositories on GitHub.

By Srdjan Petrovic and Garv Sawhney, on behalf of the Service Weaver team

Optimize for Android Go : Lessons from Google apps Part 2

Posted by Niharika Arora, Developer Relations Engineer

Building for Android Go involves paying special attention to performance optimizations and resource usage.

In part-1 of this blog, we discussed why developers should consider building for Android Go, some tips on optimizing the app memory and identified the standard approach to follow while fixing performance issues. In this blog, we will talk about the other vitals to pay attention to while building apps for Android Go.

Optimize your apps for Android Go

Improve startup latency

Improving app startup time requires a deep understanding of things that affect it.

  • Avoid eager initialization: Avoid doing eager work that may not be needed in your app’s startup sequence. The user launching your app is the most well-known reason a process starts, but WorkManager, JobScheduler, BroadcastReceiver, bound Services, and ContentProviders can also start your app process in the background. Avoid initializing things eagerly in your Application class like ContentProviders, or androidx initializers if possible (If they are not needed for any possible reason for your app process to start).
  • Move the tasks from UI thread to background thread

    • Identify operations which occupy large time frames and can be optimized.
    • Identify operations that consume more time than expected.
    • Identify operations which cause the main thread to be blocked.

If there are tasks which are taking longer and blocking the main thread, try to move them to a background thread or prefer using WorkManager
    • Check third party library initialization 
Lazy load third party libraries. A lot of libraries do have on demand initialization or disabling auto init options.
      • Check rendering time of webp / png images
        • Prefer webp images over jpg/png
        • Prefer svg for small icons.
        • Check if any layouts are invisible, but still spend time for the images to be loaded.
        • Explore using a low resolution image according to the memory capabilities of the device.
        • Remove unnecessary backgrounds/alpha from views.
    • Avoid synchronous IPCs on UI thread: Check for binder transactions keeping the main thread busy: Often there are multiple Inter process communication happening within the app. These could be anything like image / asset loading, third party libs loading, heavy work on application’s main thread like disk or network access etc. StrictMode is a useful developer tool to detect such accident usage.
    Identify and measure these transactions to understand :
        • How much time does this take and if it is expected & necessary ?
        • Is the main thread sleeping / idle / blocked during this time ? If yes, this could be a performance bottleneck.
        • Can this be delayed ?
    • Wisely use XML and Json parsing: The Gboard app optimized file list parsing by using Java code instead of XML parsing as they were parsing them out into memory as Java objects at runtime. So, GBoard did a work to convert all the XML into Java code at compile time, the latency then become time of class loading, which is much more faster, almost of previous. Benchmarking both XML and Json parsing for your app to identify the appropriate library to be used.
    • Analyze and fix severe disk read contention: To capture this, use StrictMode in your development environment.
      • Detects accidental disk or network access on the application's main thread, where UI operations are received and animations take place.
      • Can automatically terminate the app (or log it to logcat), when a violation has occurred by adding different penalties.

    Optimize app size

    Users often avoid downloading apps that seem too large, particularly in emerging markets where devices connect to spotty 2G and 3G networks with low bandwidth speed or work on pay-by-the-byte plans.

    • Remove unnecessary layouts: Gboard and Camera from Google teams validated the layouts which were unused or could be merged with small UI changes and removed the unnecessary layouts reducing overall app code size.
    • Migrate to dynamic layouts/views when appropriate: The apps deep dive and find out the layouts and views which can be dynamically rendered. They used merge and viewstub to further optimize their views and layouts.
    • Revaluate features with low DAU. Try to disable features which take more memory and make the app less performant: The team further analyzed their apps to specifically optimize for Android Go and disabled features on go devices which were not of much usage but were taking a lot of memory. They removed complex animations, large GIFs etc. to make space for parts of the app.
    • Try combine native binaries with common dependencies into one: If the app has different JNI implementations with a lot of common underlying dependencies - all the different binaries add up to the apk size with redundant components. Camera from Google app benefited from combining several JNI binaries into a single JNI binary while keeping the Java and JNI files separate. This helped the app reduce the apk size by several Mbs.
    • Try to reduce dalvik code size: Check for code that is never used at runtime, eg) large classes and auto-generated code.
      • Code optimizers like ProGuard () could help optimize and shrink code size, but they can't deal with codes guarded by runtime-constants. Replacing the check/flags with compile-time constants to make most usage of the optimization tools.
      • Reduce translatable strings size :
      a.    Don’t translate internal-only UI strings. Mark them as translatable = “false”. 
        
      b.    Remove unused alternative resources: You can use the Android Gradle plugin's resConfigs property to remove alternative resource files that your app does not need. if you are using a library that includes language resources (such as AppCompat or Google Play Services), then your app includes all translated language strings for the messages in those libraries whether the rest of your app is translated to the same languages or not. If you'd like to keep only the languages that your app officially supports, you can specify those languages using the resConfig property. Any resources for languages not specified are removed. 
       
      The following snippet shows how to limit your language resources to just English and French

        android {
            defaultConfig {
                ...
                resConfigs "en", "fr"
            }
        }

        You can read more here.

        c.    Don’t translate what doesn’t need translation: If the string does not participate in a UI, it should not be translated. Strings for the purpose of debugging, exception messages, URLs and so on should be string literals in code, not resources. 
        i.    Don’t translate what’s not shown to the user anyway
        It’s possible to have a String resource that’s practically never shown to the user, but is still strongly referenced. One example is in your <activity>, if you have an android:label set and referencing a String resource but the Activity’s label is never actually shown (e.g. it’s not a launcher Activity and it doesn’t have an app bar that shows its own label).
          
         d.    Don’t translate URLs:  Consider this example:
         

        <string name="car_frx_device_incompatible_sol_message">

        This device doesn\'t support Android Auto.\n

        <a href="https://support.google.com/androidauto/answer/6395843">Learn more</a>

        </string>

         
        You may recognize < and > - these are escape characters for “<” and “>”. They’re needed here because if you were to put an <a> tag inside a <string> tag, then the Android resource compiler would just drop them (as it does with all tags it doesn’t recognize).

        However, this means that you’re translating the HTML tags and the URL to 78 languages. Totally unnecessary.

        Instead, remove the part with HTML:

        <string name="car_frx_device_incompatible_sol_message">

        This device doesn\'t support Android Auto.

        </string>

         
        Note we don’t define a separate string for “Learn more”, because it’s a common string. To produce the HTML snippet for the link, we define "<a href="https://support.google.com/androidauto/answer/6395843>%s</a>" as a string literals in code, and then drop the value of “Learn more” from resources into the format specifier.
         
        e.    Inline untranslated stringsBy specifying strings in strings.xml you can leverage the Android framework to automatically change the actual value used at runtime based on the current configuration (e.g. based on the current locale, show a localized value for the string).    
         
        f.    Remove duplicate stringsIf you have the same string multiple times as a literal in code (“Hello World”), it will be shared across all instances. But resources don’t work this way - if you have an identical string under a different name then unless both are translated identically across 78 languages (unlikely) you’ll end up with duplicates.

        Don’t have duplicate strings!  

        g.    Don’t have separate strings for ALL CAPS or Title Case: Android has built-in support for case mutations. 

        Use android:capitalize (since API level 1). If set, specifies that this TextView has a textual input method and should automatically capitalize what the user types. The default is "none".

          • Reducing asset size: Be mindful of different target device form factors that your app supports and adjust your assets accordingly.
          • Upload your app with Android app bundles: The easiest way to gain immediate app size savings when publishing to Google Play is by uploading your app as an Android App Bundle, which is a new upload format that includes all your app’s compiled code and resources, but defers APK generation and signing to Google Play. Read more here.
          • Utilize dynamic delivery feature if applicable: Play Feature Delivery uses advanced capabilities of app bundles, allowing certain features of your app to be delivered conditionally or downloaded on demand. You can use feature modules for custom delivery. A unique benefit of feature modules is the ability to customize how and when different features of your app are downloaded onto devices running Android 5.0 (API level 21) or higher. Learn more here.

          Recap

          This part of the blog captures some best practices, recommendations and learnings from Google apps to optimize your app size, startup latency and improve go app experience that helps drive user engagement and adoption for your Android app. In part-3, you will get to know the tools that helped the Google apps identify and fix such performance issues in their app!