Tag Archives: compiler

Tools, not Rules: become a better Android developer with Compiler Explorer

Posted by Shai Barack – Android Platform Performance lead

Introducing Android support in Compiler Explorer

In a previous blog post you learned how Android engineers continuously improve the Android Runtime (ART) in ways that boost app performance on user devices. These changes to the compiler make system and app code faster or smaller. Developers don’t need to change their code and rebuild their apps to benefit from new optimizations, and users get a better experience. In this blog post I’ll take you inside the compiler with a tool called Compiler Explorer and witness some of these optimizations in action.

Compiler Explorer is an interactive website for studying how compilers work. It is an open source project that anyone can contribute to. This year, our engineers added support to Compiler Explorer for the Java and Kotlin programming languages on Android.

You can use Compiler Explorer to understand how your source code is translated to assembly language, and how high-level programming language constructs in a language like Kotlin become low-level instructions that run on the processor.

At Google our engineers use this tool to study different coding patterns for efficiency, to see how existing compiler optimizations work, to share new optimization opportunities, and to teach and learn. Learning is best when it’s done through tools, not rules. Instead of teaching developers to memorize different rules for how to write efficient code or what the compiler might or might not optimize, give the engineers the tools to find out for themselves what happens when they write their code in different ways, and let them experiment and learn. Let’s learn together!

Start by going to godbolt.org. By default we see C++ sample code, so click the dropdown that says C++ and select Android Java. You should see this sample code:

class Square {
   static int square(int num) {
       return num * num;
   }
}
screenshot of sample code in Compiler Explorer
click to enlarge

On the left you’ll see a very simple program. You might say that this is a one line program. But this is not a meaningful statement in terms of performance - how many lines of code there are doesn’t tell us how long this program will take to run, or how much memory will be occupied by the code when the program is loaded.

On the right you’ll see a disassembly of the compiler output. This is expressed in terms of assembly language for the target architecture, where every line is a CPU instruction. Looking at the instructions, we can say that the implementation of the square(int num) method consists of 2 instructions in the target architecture. The number and type of instructions give us a better idea for how fast the program is than the number of lines of source code. Since the target architecture is AArch64 aka ARM64, every instruction is 4 bytes, which means that our program’s code occupies 8 bytes in RAM when the program is compiled and loaded.

Let’s take a brief detour and introduce some Android toolchain concepts.


The Android build toolchain (in brief)

a flow diagram of the Android build toolchain

When you write your Android app, you’re typically writing source code in the Java or Kotlin programming languages. When you build your app in Android Studio, it’s initially compiled by a language-specific compiler into language-agnostic JVM bytecode in a .jar. Then the Android build tools transform the .jar into Dalvik bytecode in .dex files, which is what the Android Runtime executes on Android devices. Typically developers use d8 in their Debug builds, and r8 for optimized Release builds. The .dex files go in the .apk that you push to test devices or upload to an app store. Once the .apk is installed on the user’s device, an on-device compiler which knows the specific target device architecture can convert the bytecode to instructions for the device’s CPU.

We can use Compiler Explorer to learn how all these tools come together, and to experiment with different inputs and see how they affect the outputs.

Going back to our default view for Android Java, on the left is Java source code and on the right is the disassembly for the on-device compiler dex2oat, the very last step in our toolchain diagram. The target architecture is ARM64 as this is the most common CPU architecture in use today by Android devices.

The ARM64 Instruction Set Architecture offers many instructions and extensions, but as you read disassemblies you will find that you only need to memorize a few key instructions. You can look for ARM64 Quick Reference cards online to help you read disassemblies.

At Google we study the output of dex2oat in Compiler Explorer for different reasons, such as:

    • Gaining intuition for what optimizations the compiler performs in order to think about how to write more efficient code.
    • Estimating how much memory will be required when a program with this snippet of code is loaded into memory.
    • Identifying optimization opportunities in the compiler - ways to generate instructions for the same code that are more efficient, resulting in faster execution or in lower memory usage without requiring app developers to change and rebuild their code.
    • Troubleshooting compiler bugs! 🐞

Compiler optimizations demystified

Let’s look at a real example of compiler optimizations in practice. In the previous blog post you can read about compiler optimizations that the ART team recently added, such as coalescing returns. Now you can see the optimization, with Compiler Explorer!

Let’s load this example:

class CoalescingReturnsDemo {
   String intToString(int num) {
       switch (num) {
           case 1:
               return "1";
           case 2:
               return "2";
           case 3:
               return "3";           
           default:
               return "other";
       }
   }
}
screenshot of sample code in Compiler Explorer
click to enlarge

How would a compiler implement this code in CPU instructions? Every case would be a branch target, with a case body that has some unique instructions (such as referencing the specific string) and some common instructions (such as assigning the string reference to a register and returning to the caller). Coalescing returns means that some instructions at the tail of each case body can be shared across all cases. The benefits grow for larger switches, proportional to the number of the cases.

You can see the optimization in action! Simply create two compiler windows, one for dex2oat from the October 2022 release (the last release before the optimization was added), and another for dex2oat from the November 2023 release (the first release after the optimization was added). You should see that before the optimization, the size of the method body for intToString was 124 bytes. After the optimization, it’s down to just 76 bytes.

This is of course a contrived example for simplicity’s sake. But this pattern is very common in Android code. For instance consider an implementation of Handler.handleMessage(Message), where you might implement a switch statement over the value of Message#what.

How does the compiler implement optimizations such as this? Compiler Explorer lets us look inside the compiler’s pipeline of optimization passes. In a compiler window, click Add New > Opt Pipeline. A new window will open, showing the High-level Internal Representation (HIR) that the compiler uses for the program, and how it’s transformed at every step.

screenshot of the high-level internal representation (HIR) the compiler uses for the program in Compiler Explorer
click to enlarge

If you look at the code_sinking pass you will see that the November 2023 compiler replaces Return HIR instructions with Goto instructions.

Most of the passes are hidden when Filters > Hide Inconsequential Passes is checked. You can uncheck this option and see all optimization passes, including ones that did not change the HIR (i.e. have no “diff” over the HIR).

Let’s study another simple optimization, and look inside the optimization pipeline to see it in action. Consider this code:

class ConstantFoldingDemo {
   static int demo(int num) {
       int result = num;
       if (num == 2) {
           result = num + 2;
       }
       return result;
   }
}

The above is functionally equivalent to the below:

class ConstantFoldingDemo {
   static int demo(int num) {
       int result = num;
       if (num == 2) {
           result = 4;
       }
       return result;
   }
}

Can the compiler make this optimization for us? Let’s load it in Compiler Explorer and turn to the Opt Pipeline Viewer for answers.

screenshot of Opt Pipeline Viewer in Compiler Explorer
click to enlarge

The disassembly shows us that the compiler never bothers with “two plus two”, it knows that if num is 2 then result needs to be 4. This optimization is called constant folding. Inside the conditional block where we know that num == 2 we propagate the constant 2 into the symbolic name num, then fold num + 2 into the constant 4.

You can see this optimization happening over the compiler’s IR by selecting the constant_folding pass in the Opt Pipeline Viewer.

Kotlin and Java, side by side

Now that we’ve seen the instructions for Java code, try changing the language to Android Kotlin. You should see this sample code, the Kotlin equivalent of the basic Java sample we’ve seen before:

fun square(num: Int): Int = num * num
screenshot of sample code in Kotlin in Compiler Explorer
click to enlarge

You will notice that the source code is different but the sample program is functionally identical, and so is the output from dex2oat. Finding the square of a number results in the same instructions, whether you write your source code in Java or in Kotlin.

You can take this opportunity to study interesting language features and discover how they work. For instance, let’s compare Java String concatenation with Kotlin String interpolation.

In Java, you might write your code as follows:

class StringConcatenationDemo {
   void stringConcatenationDemo(String myVal) {
       System.out.println("The value of myVal is " + myVal);
   }
}

Let’s find out how Java String concatenation actually works by trying this example in Compiler Explorer.

screenshot of sample code in Kotlin in Compiler Explorer
click to enlarge

First you will notice that we changed the output compiler from dex2oat to d8. Reading Dalvik bytecode, which is the output from d8, is usually easier than reading the ARM64 instructions that dex2oat outputs. This is because Dalvik bytecode uses higher level concepts. Indeed you can see the names of types and methods from the source code on the left side reflected in the bytecode on the right side. Try changing the compiler to dex2oat and back to see the difference.

As you read the d8 output you may realize that Java String concatenation is actually implemented by rewriting your source code to use a StringBuilder. The source code above is rewritten internally by the Java compiler as follows:

class StringConcatenationDemo {
   void stringConcatenationDemo(String myVal) {
       StringBuilder sb = new StringBuilder();
       sb.append("The value of myVal is ");
       sb.append(myVal);
       System.out.println(sb.toString());
  }
}

In Kotlin, we can use String interpolation:

fun stringInterpolationDemo(myVal: String) {
   System.out.println("The value of myVal is $myVal");
}

The Kotlin syntax is easier to read and write, but does this convenience come at a cost? If you try this example in Compiler Explorer, you may find that the Dalvik bytecode output is roughly the same! In this case we see that Kotlin offers an improved syntax, while the compiler emits similar bytecode.

At Google we study examples of language features in Compiler Explorer to learn about how high-level language features are implemented in lower-level terms, and to better inform ourselves on the different tradeoffs that we might make in choosing whether and how to adopt these language features. Recall our learning principle: tools, not rules. Rather than memorizing rules for how you should write your code, use the tools that will help you understand the upsides and downsides of different alternatives, and then make an informed decision.

What happens when you minify your app?

Speaking of making informed decisions as an app developer, you should be minifying your apps with R8 when building your Release APK. Minifying generally does three things to optimize your app to make it smaller and faster:

      1. Dead code elimination: find all the live code (code that is reachable from well-known program entry points), which tells us that the remaining code is not used, and therefore can be removed.

      2. Bytecode optimization: various specialized optimizations that rewrite your app’s bytecode to make it functionally identical but faster and/or smaller.

      3. Obfuscation: renaming all types, methods, and fields in your program that are not accessed by reflection (and therefore can be safely renamed) from their names in source code (com.example.MyVeryLongFooFactorySingleton) to shorter names that fit in less memory (a.b.c).

Let’s see an example of all three benefits! Start by loading this view in Compiler Explorer.

screenshot of sample code in Kotlin in Compiler Explorer
click to enlarge

First you will notice that we are referencing types from the Android SDK. You can do this in Compiler Explorer by clicking Libraries and adding Android API stubs.

Second, you will notice that this view has multiple source files open. The Kotlin source code is in example.kt, but there is another file called proguard.cfg.

-keep class MinifyDemo {
   public void goToSite(...);
}

Looking inside this file, you’ll see directives in the format of Proguard configuration flags, which is the legacy format for configuring what to keep when minifying your app. You can see that we are asking to keep a certain method of MinifyDemo. “Keeping” in this context means don’t shrink (we tell the minifier that this code is live). Let’s say we’re developing a library and we’d like to offer our customer a prebuilt .jar where they can call this method, so we’re keeping this as part of our API contract.

We set up a view that will let us see the benefits of minifying. On one side you’ll see d8, showing the dex code without minification, and on the other side r8, showing the dex code with minification. By comparing the two outputs, we can see minification in action:

      1. Dead code elimination: R8 removed all the logging code, since it never executes (as DEBUG is always false). We removed not just the calls to android.util.Log, but also the associated strings.

      2. Bytecode optimization: since the specialized methods goToGodbolt, goToAndroidDevelopers, and goToGoogleIo just call goToUrl with a hardcoded parameter, R8 inlined the calls to goToUrl into the call sites in goToSite. This inlining saves us the overhead of defining a method, invoking the method, and returning from the method.

      3. Obfuscation: we told R8 to keep the public method goToSite, and it did. R8 also decided to keep the method goToUrl as it’s used by goToSite, but you’ll notice that R8 renamed that method to a. This method’s name is an internal implementation detail, so obfuscating its name saved us a few precious bytes.

You can use R8 in Compiler Explorer to understand how minification affects your app, and to experiment with different ways to configure R8.

At Google our engineers use R8 in Compiler Explorer to study how minification works on small samples. The authoritative tool for studying how a real app compiles is the APK Analyzer in Android Studio, as optimization is a whole-program problem and a snippet might not capture every nuance. But iterating on release builds of a real app is slow, so studying sample code in Compiler Explorer helps our engineers quickly learn and iterate.

Google engineers build very large apps that are used by billions of people on different devices, so they care deeply about these kinds of optimizations, and strive to make the most use out of optimizing tools. But many of our apps are also very large, and so changing the configuration and rebuilding takes a very long time. Our engineers can now use Compiler Explorer to experiment with minification under different configurations and see results in seconds, not minutes.

You may wonder what would happen if we changed our code to rename goToSite? Unfortunately our build would break, unless we also renamed the reference to that method in the Proguard flags. Fortunately, R8 now natively supports Keep Annotations as an alternative to Proguard flags. We can modify our program to use Keep Annotations:

@UsedByReflection(kind = KeepItemKind.CLASS_AND_METHODS)
public static void goToSite(Context context, String site) {
    ...
}

Here is the complete example. You’ll notice that we removed the proguard.cfg file, and under Libraries we added “R8 keep-annotations”, which is how we’re importing @UsedByReflection.

At Google our engineers prefer annotations over flags. Here we’ve seen one benefit of annotations - keeping the information about the code in one place rather than two makes refactors easier. Another is that the annotations have a self-documenting aspect to them. For instance if this method was kept actually because it’s called from native code, we would annotate it as @UsedByNative instead.

Baseline profiles and you

Lastly, let’s touch on baseline profiles. So far you saw some demos where we looked at dex code, and others where we looked at ARM64 instructions. If you toggle between the different formats you will notice that the high-level dex bytecode is much more compact than low-level CPU instructions. There is an interesting tradeoff to explore here - whether, and when, to compile bytecode to CPU instructions?

For any program method, the Android Runtime has three compilation options:

      1. Compile the method Just in Time (JIT).

      2. Compile the method Ahead of Time (AOT).

      3. Don’t compile the method at all, instead use a bytecode interpreter.

Running code in an interpreter is an order of magnitude slower, but doesn’t incur the cost of loading the representation of the method as CPU instructions which as we’ve seen is more verbose. This is best used for “cold” code - code that runs only once, and is not critical to user interactions.

When ART detects that a method is “hot”, it will be JIT-compiled if it’s not already been AOT compiled. JIT compilation accelerates execution times, but pays the one-time cost of compilation during app runtime. This is where baseline profiles come in. Using baseline profiles, you as the app developer can give ART a hint as to which methods are going to be hot or otherwise worth compiling. ART will use that hint before runtime, compiling the code AOT (usually at install time, or when the device is idle) rather than at runtime. This is why apps that use Baseline Profiles see faster startup times.

With Compiler Explorer we can see Baseline Profiles in action.

Let’s open this example.

screenshot of sample code in Compiler Explorer
click to enlarge

The Java source code has two method definitions, factorial and fibonacci. This example is set up with a manual baseline profile, listed in the file profile.prof.txt. You will notice that the profile only references the factorial method. Consequently, the dex2oat output will only show compiled code for factorial, while fibonacci shows in the output with no instructions and a size of 0 bytes.

In the context of compilation modes, this means that factorial is compiled AOT, and fibonacci will be compiled JIT or interpreted. This is because we applied a different compiler filter in the profile sample. This is reflected in the dex2oat output, which reads: “Compiler filter: speed-profile” (AOT compile only profile code), where previous examples read “Compiler filter: speed” (AOT compile everything).

Conclusion

Compiler Explorer is a great tool for understanding what happens after you write your source code but before it can run on a target device. The tool is easy to use, interactive, and shareable. Compiler Explorer is best used with sample code, but it goes through the same procedures as building a real app, so you can see the impact of all steps in the toolchain.

By learning how to use tools like this to discover how the compiler works under the hood, rather than memorizing a bunch of rules of optimization best practices, you can make more informed decisions.

Now that you've seen how to use the Java and Kotlin programming languages and the Android toolchain in Compiler Explorer, you can level up your Android development skills.

Lastly, don't forget that Compiler Explorer is an open source project on GitHub. If there is a feature you'd like to see then it's just a Pull Request away.


Java and OpenJDK are trademarks or registered trademarks of Oracle and/or its affiliates.

The secret to Android’s improved memory on 1B+ Devices: The latest Android Runtime update

Posted by Santiago Aboy Solanes - Software Engineer

The Android Runtime (ART) executes Dalvik bytecode produced from apps and system services written in the Java or Kotlin languages. We constantly improve ART to generate smaller and more performant code. Improving ART makes the system and user-experience better as a whole, as it is the common denominator in Android apps. In this blog post we will talk about optimizations that reduce code size without impacting performance.

Code size is one of the key metrics we look at, since smaller generated files are better for memory (both RAM and storage). With the new release of ART, we estimate saving users about 50-100MB per device. This could be just the thing you need to be able to update your favorite app, or download a new one. Since ART has been updateable from Android 12, these optimizations reach 1B+ devices for whom we are saving 47-95 petabytes (47-95 millions of GB!) globally!

All the improvements mentioned in this blog post are open source. They are available through the ART mainline update so you don’t even need a full OS update to reap the benefits. We can have our upside-down cake and eat it too!

Optimizing compiler 101

ART compiles applications from the DEX format to native code using the on-device dex2oat tool. The first step is to parse the DEX code and generate an Intermediate Representation (IR). Using the IR, dex2oat performs a number of code optimizations. The last step of the pipeline is a code generation phase where dex2oat converts the IR into native code (for example, AArch64 assembly).

The optimization pipeline has phases that execute in order that each concentrate on a particular set of optimizations. For example, Constant Folding is an optimization that tries to replace instructions with constant values like folding the addition operation 2 + 3 into a 5.

ART's optimization pipeline overview with an example showing we can combine the addition of 2 plus 3 into a 5

The IR can be printed and visualized, but is very verbose compared to Kotlin language code. For the purposes of this blog post, we will show what the optimizations do using Kotlin language code, but know that they are happening to IR code.

Code size improvements

For all code size optimizations, we tested our optimizations on over half a million APKs present in the Google Play Store and aggregated the results.

Eliminating write barriers

We have a new optimization pass that we named Write Barrier Elimination. Write barriers track modified objects since the last time they were examined by the Garbage Collector (GC) so that the GC can revisit them. For example, if we have:

Example showing that we can eliminate redundant write barriers if there a GC cannot happen between  set instructionsPreviously, we would emit a write barrier for each object modification but we only need a single write barrier because: 1) the mark will be set in o itself (and not in the inner objects), and 2) a garbage collection can't have interacted with the thread between those sets.

If an instruction may trigger a GC (for example, Invokes and SuspendChecks), we wouldn't be able to eliminate write barriers. In the following example, we can't guarantee a GC won't need to examine or modify the tracking information between the modifications:

Example showing that we can't eliminate redundant write barriers because a GC may happen between set instructionsImplementing this new pass contributes to 0.8% code size reduction.

Implicit suspend checks

Let's assume we have several threads running. Suspend checks are safepoints (represented by the houses in the image below) where we can pause the thread execution. Safepoints are used for many reasons, the most important of them being Garbage Collection. When a safepoint call is issued, the threads must go into a safepoint and are blocked until they are released.

The previous implementation was an explicit boolean check. We would load the value, test it, and branch into the safepoint if needed.

Shows the explicit suspend check (load + test + branch) when multiple threads are running

Implicit suspend checks is an optimization that eliminates the need for the test and branch instructions. Instead, we only have one load: if the thread needs to suspend, that load will trap and the signal handler will redirect the code to a suspend check handler as if the method made a call.

Shows the implicit suspend check (two loads: the first one loads null and the second one traps) when multiple threads are running

Going into a bit more detail, a reserved register rX is pre-loaded with an address within the thread where we have a pointer pointing to itself. As long as we don't need to do a suspend check, we keep that self-pointing pointer. When we need to do a suspend check, we clear the pointer and once it becomes visible to the thread the first LDR rX, [rX] will load null and the second will segfault.

The suspend request is essentially asking the thread to suspend some time soon, so the minor delay of having to wait for the second load is okay.

This optimization reduces code size by 1.8%.

Coalescing returns

It is common for compiled methods to have an entry frame. If they have it, those methods have to deconstruct it when they return, which is also known as an exit frame. If a method has several return instructions, it will generate several exit frames, one per return instruction.

By coalescing the return instructions into one, we are able to have one return point and are able to remove the extra exit frames. This is especially useful for switch cases with multiple return statements. Switch case optimized by having one return instead of multiple return instructions

Coalescing returns reduces code size by 1%.

Other optimization improvements

We improved a lot of our existing optimization passes. For this blog post, we will group them up in the same section, but they are independent of each other. All the optimizations in the following section contribute to a 5.7% code size reduction.

Code Sinking

Code sinking is an optimization pass that pushes down instructions to uncommon branches like paths that end with a throw. This is done to reduce wasted cycles on instructions that are likely not going to be used.

We improved code sinking in graphs with try catches: we now allow sinking code as long as we don't sink it inside of a different try than the one it started in (or inside of any try if it wasn't in one to begin with).

Code sinking optimizations in the presence of a try catch

In the first example, we can sink the Object creation since it will only be used in the if(flag) path and not the other and it is within the same try. With this change, at runtime it will only be run if flag is true. Without getting into too much technical detail, what we can sink is the actual object creation, but loading the Object class still remains before the if. This is hard to show with Kotlin code, as the same Kotlin line turns into several instructions at the ART compiler level.

In the second example, we cannot sink the code as we would be moving an instance creation (which may throw) inside of another try.

Code Sinking is mostly a runtime performance optimization, but it can help reduce the register pressure. By moving instructions closer to their uses, we can use fewer registers in some cases. Using fewer registers means fewer move instructions, which ends up helping code size.

Loop optimization

Loop optimization helps eliminate loops at compile time. In the following example, the loop in foo will multiply a by 10, 10 times. This is the same as multiplying by 100. We enabled loop optimization to work in graphs with try catches.

Loop optimization in the presence of a try catch

In foo, we can optimize the loop since the try catch is unrelated.

In bar or baz, however, we don't optimize it. It is not trivial to know the path the loop will take if we have a try in the loop, or if the loop as a whole is inside of a try.

Dead code elimination – Remove unneeded try blocks

We improved our dead code elimination phase by implementing an optimization that removes try blocks that don't contain throwing instructions. We are also able to remove some catch blocks, as long as no live try blocks point to it.

In the following example, we inline bar into foo. After that, we know that the division cannot throw. Later optimization passes can leverage this and improve the code.

We can remove tries which contain no throwing instructions

Just removing the dead code from the try catch is good enough, but even better is the fact that in some cases we allow other optimizations to take place. If you recall, we don't do loop optimization when the loop has a try, or it's inside of one. By eliminating this redundant try/catch, we can loop optimize producing smaller and faster code.

Another example of removing tries which contain no throwing instructions

Dead code elimination – SimplifyAlwaysThrows

During the dead code elimination phase, we have an optimization we call SimplifyAlwaysThrows. If we detect that an invoke will always throw, we can safely discard whatever code we have after that method call since it will never be executed.

We also updated SimplifyAlwaysThrows to work in graphs with try catches, as long as the invoke itself is not inside of a try. If it is inside of a try, we might jump to a catch block, and it gets harder to figure out the exact path that will be executed.

We can use the SimplifyAlwaysThrows optimization as long as the invoke itself is not inside of a try

We also improved:

  • Detection when an invoke will always throw by looking at their parameters. On the left, we will mark divide(1, 0) as always throwing even when the generic method doesn't always throw.
  • SimplifyAlwaysThrows to work on all invokes. Previously we had restrictions for example don't do it for invokes leading to an if, but we could remove all of the restrictions.

We improved detection, and removed some of the restrictions from this optimization

Load Store Elimination – Working with try catch blocks

Load store elimination (LSE) is an optimization pass that removes redundant loads and stores.

We improved this pass to work with try catches in the graph. In foo, we can see that we can do LSE normally if the stores/loads are not directly interacting with the try. In bar, we can see an example where we either go through the normal path and don't throw, in which case we return 1; or we throw, catch it and return 2. Since the value is known for every path, we can remove the redundant load.

Examples showing we can perform Load Store Elimination in graphs with try catches, as long as the instructions are not inside of a try

Load Store Elimination – Working with release/acquire operations

We improved our load store elimination pass to work in graphs with release/acquire operations. These are volatile loads, stores, and monitor operations. To clarify, this means that we allow LSE to work in graphs that have those operations, but we don't remove said operations.

In the example, i and j are regular ints, and vi is a volatile int. In foo, we can skip loading the values since there's not a release/acquire operation between the sets and the loads. In bar, the volatile operation happens between them so we can't eliminate the normal loads. Note that it doesn't matter that the volatile load result is not used—we cannot eliminate the acquire operation.

Examples showing that we can perform LSE in graphs with release/acquire operations. Note that the release/acquire operations themselves are not removed since they are needed for synchronization.

This optimization works similarly with volatile stores and monitor operations (which are synchronized blocks in Kotlin).

New inliner heuristic

Our inliner pass has a wide range of heuristics. Sometimes we decide not to inline a method because it is too big, or sometimes to force inlining of a method because it is too small (for example, empty methods like Object initialization).

We implemented a new inliner heuristic: Don't inline invokes leading to a throw. If we know we are going to throw we will skip inlining those methods, as throwing itself is costly enough that inlining that code path is not worth it.

We had three families of methods that we are skipping to inline:

  • Calculating and printing debug information before a throw.
  • Inlining the error constructor itself.
  • Finally blocks are duplicated in our optimizing compiler. We have one for the normal case (i.e. the try doesn't throw), and one for the exceptional case. We do this because in the exceptional case we have to: catch, execute the finally block, and rethrow. The methods in the exceptional case will now not be inlined, but the ones in the normal case will.

Examples showing:
* calculating and printing debug information before a throw
* inlining the error constructor itself
* methods in finally blocks

Constant folding

Constant folding is the optimization pass that changes operations into constants if possible. We implemented an optimization that propagates variables known to be constant when used in if guards. Having more constants in the graph allows us to perform more optimizations later.

In foo, we know that a has the value 2 in the if guard. We can propagate that information and deduce that b must be 4. In a similar vein, in bar we know that cond must be true in the if case and false in the else case (simplifying the graphs).

Example showing that if we know that a variable has a constant value within the scope of an 
 `if`, we will propagate that example within said scope

Putting it all together

If we take into account all code size optimizations in this blog post we achieved a 9.3% code size reduction!

To put things into perspective, an average phone can have 500MB-1GB in optimized code (The actual number can be higher or lower, depending on how many apps you have installed, and which particular apps you installed), so these optimizations save about 50-100MB per device. Since these optimizations reach 1B+ devices, we are saving 47-95 petabytes globally!

Further reading

If you are interested in the code changes themselves, feel free to take a look. All the improvements mentioned in this blog post are open source. If you want to help Android users worldwide, consider contributing to the Android Open Source Project!

  • Write barrier elimination: 1
  • Implicit suspend checks: 1
  • Coalescing returns: 1
  • Code sinking: 1, 2
  • Loop optimization: 1
  • Dead code elimination: 1, 2, 3, 4
  • Load store elimination: 1, 2, 3, 4
  • New inlining heuristic: 1
  • Constant folding: 1

Java is a trademark or registered trademark of Oracle and/or its affiliates

API desugaring supporting Android 13 and java.nio

Posted by Clément Béra, Software engineer

We are happy to announce the release of a new version of API desugaring based on Android 13 and Java 11 language APIs. API desugaring allows developers to use more APIs without requiring a minimum API level for your app. This reduces friction during development. You are now free to use the java.nio APIs no matter which Android version is on the user’s device. For example, libraries such as kotlin.io.path are now available on all Android devices, including devices running Android 7 and lower.

In addition to supporting java.nio, API desugaring of java.time and java.util.stream has been updated to support APIs added up to Android 13. You can use recent APIs in your Android app such as Collectors#toUnmodifiableList.

New API desugaring specifications

To enable API desugaring, you set isCoreLibraryDesugaringEnabled and add the coreLibraryDesugaring dependency in the build.gradle.kts file.

android { ... compileSdk = 33 defaultConfig { ... // Required when setting minSdkVersion to 20 or lower. multiDexEnabled = true ... } ... compileOptions { // Flag to enable support for API desugaring. isCoreLibraryDesugaringEnabled = true // Sets Java compatibility to Java 8 sourceCompatibility = JavaVersion.VERSION_1_8 targetCompatibility = JavaVersion.VERSION_1_8 } kotlinOptions { jvmTarget = "1.8" } } dependencies { // Dependency required for API desugaring. coreLibraryDesugaring("com.android.tools:desugar_jdk_libs_nio:2.0.2") }

The new version 2.0 release comes in 3 flavors:

  • com.android.tools:desugar_jdk_libs_nio:2.0.2 - the nio version includes all the desugaring available including the java.nio, java.time, stream, and functions APIs.
  • com.android.tools:desugar_jdk_libs:2.0.2 - the default version includes desugaring for the java.time, stream, and functions APIs. It’s similar to version 1.x API desugaring already available, but updated with APIs added up to Android 13.
  • com.android.tools:desugar_jdk_libs_minimal:2.0.2 - the minimal version includes only the java.util.function package and bug fixes on concurrent collections. It’s designed for minimal code size overhead.

Opting into more desugaring features will lead to a larger impact on your app’s code size. The minimal specification has, as its name indicates, a minimal impact on the code size of the app. The nio specification has the most impact.

The new java.nio APIs

The new java.nio APIs supported in API desugaring include:

  • All the classes and APIs in java.nio.file such as BasicFileAttributes, file manipulation, or usage of java.nio.file.Path.
  • Some extensions of java.nio.channels, such as the FileChannel#open methods.
  • A few utility methods such as File#toPath.

The following code snippet illustrates how you can now use the new java.nio APIs on all devices, including devices running Android 7 and lower, through the methods of kotlin.io.path which depend on java.nio.file.Files. A temp file can be created, written into, read, and its basic attributes and its existence can be queried using the new java.nio APIs.

import android.util.Log import java.nio.file.StandardOpenOption.APPEND import kotlin.io.path.createTempDirectory import kotlin.io.path.deleteIfExists import kotlin.io.path.exists import kotlin.io.path.fileSize import kotlin.io.path.readLines import kotlin.io.path.writeLines ... val TAG = "java.nio Test" val tempDirectory = createTempDirectory("tempFile") val tempFile = tempDirectory.resolve("tempFile") tempFile.writeLines(listOf("first")) tempFile.writeLines(listOf("second"), options = arrayOf(APPEND)) Log.d(TAG,"Content: ${tempFile.readLines()}") Log.d(TAG,"Size: ${tempFile.fileSize()}") Log.d(TAG,"Exists (before deletion): ${tempFile.exists()}") tempFile.deleteIfExists() Log.d(TAG,"Exists (after deletion): ${tempFile.exists()}")

// Resulting logcat output. Content: first second Size: 13 Exists (before deletion): true Exists (after deletion): false

A few features however cannot be emulated for devices running Android 7 and lower and instead throw an instance of UnsupportedOperationException or return null. They still work on devices running Android 8 or higher, so existing code guarded by an API level check should work as it used to. See the complete list of APIs available and the known limitations.

The code has been extensively tested, but we are looking for additional inputs from app developers.

Please try out the new version of API desugaring, and let us know how it worked for you!

For additional background see the post Support for newer Java language APIs from when API desugaring was introduced.

Java and OpenJDK are trademarks or registered trademarks of Oracle and/or its affiliates.