Tag Archives: GenerativeAI

Generate Stunning Visuals in Your Android Apps with Imagen 3 via Vertex AI in Firebase

Posted by Thomas Ezan Sr. – Android Developer Relation Engineer (@lethargicpanda)

Imagen 3, our most advanced image generation model, is now available through Vertex AI in Firebase, making it even easier to integrate it to your Android apps.

Designed to generate well-composed images with exceptional details, reduced artifacts, and rich lighting, Imagen 3 represents a significant leap forward in image generation capabilities.

Hot air balloons float over a scenic desert landscape with unique rock formations.
Image generated by Imagen 3 with prompt: “Shot in the style of DSLR camera with the polarizing filter. A photo of two hot air balloons over the unique rock formations in Cappadocia, Turkey. The colors and patterns on these balloons contrast beautifully against the earthy tones of the landscape below. This shot captures the sense of adventure that comes with enjoying such an experience.”

A wooden robot stands in a field of yellow flowers, holding a small blue bird on its outstretched hand.
Image generated by Imagen 3 with prompt: A weathered, wooden mech robot covered in flowering vines stands peacefully in a field of tall wildflowers, with a small blue bird resting on its outstretched hand. Digital cartoon, with warm colors and soft lines. A large cliff with a waterfall looms behind.

Imagen 3 unlocks exciting new possibilities for Android developers. Generated visuals can adapt to the content of your app, creating a more engaging user experience. For instance, your users can generate custom artwork to enhance their in-app profile. Imagen can also improve your app's storytelling by bringing its narratives to life with delightful personalized illustrations.

You can experiment with image prompts in Vertex AI Studio, and learn how to improve your prompts by reviewing the prompt and image attribute guide.

Get started with Imagen 3

The integration of Imagen 3 is similar to adding Gemini access via Vertex AI in Firebase. Start by adding the gradle dependencies to your Android project:

dependencies {
    implementation(platform("com.google.firebase:firebase-bom:33.10.0"))

    implementation("com.google.firebase:firebase-vertexai")
}

Then, in your Kotlin code, create an ImageModel instance by passing the model name and optionally, a model configuration and safety settings:

val imageModel = Firebase.vertexAI.imagenModel(
  modelName = "imagen-3.0-generate-001",
  generationConfig = ImagenGenerationConfig(
    imageFormat = ImagenImageFormat.jpeg(compresssionQuality = 75),
    addWatermark = true,
    numberOfImages = 1,
    aspectRatio = ImagenAspectRatio.SQUARE_1x1
  ),
  safetySettings = ImagenSafetySettings(
    safetyFilterLevel = ImagenSafetyFilterLevel.BLOCK_LOW_AND_ABOVE
    personFilterLevel = ImagenPersonFilterLevel.ALLOW_ADULT
  )
)

Finally generate the image by calling generateImages:

val imageResponse = imageModel.generateImages(
  prompt = "An astronaut riding a horse"
)

Retrieve the generated image from the imageResponse and display it as a bitmap as follow:

val image = imageResponse.images.first()
val uiImage = image.asBitmap()

Next steps

Explore the comprehensive Firebase documentation for detailed API information.

Access to Imagen 3 using Vertex AI in Firebase is currently in Public Preview, giving you an early opportunity to experiment and innovate. For pricing details, please refer to the Vertex AI in Firebase pricing page.

Start experimenting with Imagen 3 today! We're looking forward to seeing how you’ll leverage Imagen 3's capabilities to create truly unique, immersive and personalized Android experiences.

Production-ready generative AI on Android with Vertex AI in Firebase

Posted by Thomas Ezan – Sr. Developer Relation Engineer (@lethargicpanda)

Gemini can help you build and launch new user features that will boost engagement and create personalized experiences for your users.

The Vertex AI in Firebase SDK lets you access Google’s Gemini Cloud models (like Gemini 1.5 Flash and Gemini 1.5 Pro) and add GenAI capabilities to your Android app. It became generally available last October which means it's now ready for production and it is already used by many apps in Google Play.

Here are tips for a successful deployment to production.

Implement App Check to prevent API abuse

When using the Vertex AI in Firebase API it is crucial to implement robust security measures to prevent unauthorized access and misuse.

Firebase App Check helps protect backend resources (like Vertex AI in Firebase, Cloud Functions for Firebase, or even your own custom backend) from abuse. It does this by attesting that incoming traffic is coming from your authentic app running on an authentic and untampered Android device.

A flow diagram illustrating App Check, with green lines depicting 'User Request' going through App Check to 'Backend'. A red line depicting 'Bad Request' is being blocked by App Check.
Firebase App Check ensures that only legitimate users access your backend resources

To get started, add Firebase to your Android project and enable the Play Integrity API for your app in the Google Play console. Back in the Firebase console, go to the App Check section of your Firebase project to register your app by providing its SHA-256 fingerprint.

Then, update your Android project’s Gradle dependencies with the App Check library for Android:

dependencies {
    // BoM for the Firebase platform
   implementation(platform("com.google.firebase:firebase-bom:33.7.0"))

    // Dependency for App Check
    implementation("com.google.firebase:firebase-appcheck-playintegrity")
}

Finally, in your Kotlin code, initialize App Check before using any other Firebase SDK:

Firebase.initialize(context)
Firebase.appCheck.installAppCheckProviderFactory(
    PlayIntegrityAppCheckProviderFactory.getInstance(),
)

To enhance the security of your generative AI feature, you should implement and enforce App Check before releasing your app to production. Additionally, if your app utilizes other Firebase services like Firebase Authentication, Firestore, or Cloud Functions, App Check provides an extra layer of protection for those resources as well.

Once App Check is enforced, you’ll be able to monitor your app’s requests in the Firebase console.

An area chart of the Apps Check metrics page in Firebase console, showing the percentages of verified and unverified requests over several days. Numerical breakdowns of verified (51%) and unverified requests (49%) are shown.
App Check metrics page in the Firebase console

You can learn more about App Check on Android in the Firebase documentation.

Use Remote Config for server-controlled configuration

The generative AI landscape evolves quickly. Every few months, new Gemini model iterations become available and some models are removed. See the Vertex AI in Firebase Gemini models page for details.

Because of this, instead of hardcoding the model name in your app, we recommend using a server-controlled variable using Firebase Remote Config. This allows you to dynamically update the model your app uses without having to deploy a new version of your app or require your users to pick up a new version.

You define parameters that you want to control (like model name) using the Firebase console. Then, you add these parameters into your app, along with default "fallback" values for each parameter. Back in the Firebase console, you can change the value of these parameters at any time. Your app will automatically fetch the new value.

Here's how to implement Remote Config in your app:

// Initialize the remote configuration by defining the refresh time
val remoteConfig: FirebaseRemoteConfig = Firebase.remoteConfig
val configSettings = remoteConfigSettings {
    minimumFetchIntervalInSeconds = 3600
}
remoteConfig.setConfigSettingsAsync(configSettings)

// Set default values defined in your app resources 
remoteConfig.setDefaultsAsync(R.xml.remote_config_defaults)

// Load the model name
val modelName = remoteConfig.getString("model_name")

Read more about using Remote Config with Vertex AI in Firebase.

Gather user feedback to evaluate impact

As you roll out your AI-enabled feature to production, it's critical to build feedback mechanisms into your product and allow users to easily signal whether the AI output was helpful, accurate, or relevant. For example, you can incorporate interactive elements such as thumb-up and thumb-down buttons and detailed feedback forms within the user interface. The Material Icons in Compose package provides ready to use icons to help you implement it.

You can easily track the user interaction with these elements as custom analytics events by using Google Analytics logEvent() function:

Row {
   Button (
      onClick = {
         firebaseAnalytics.logEvent("model_response_feedback") {
            param("feedback", "thumb_up")
         }
      }
   ) {
      Icon(Icons.Default.ThumbUp, contentDescription = "Thumb up")
   },
   Button (
      onClick = {
         firebaseAnalytics.logEvent("model_response_feedback") {
            param("feedback", "thumb_down")
         }
      }
   ) {
      Icon(Icons.Default.ThumbDown, contentDescription = "Thumb down")
   }
}

Learn more about Google Analytics and its event logging capabilities in the Firebase documentation.

User privacy and responsible AI

When you use Vertex AI in Firebase for inference, you have the guarantee that the data sent to Google won’t be used by Google to train AI models (see Vertex AI documentation for details).

It's also important to be transparent with your users when they're engaging with generative AI technology. You should highlight the possibility of unexpected model behavior.

Finally, users should have control within your app over how their activity related to AI model interactions is stored and deleted.

You can learn more about how Google is approaching Generative AI responsibly in the Google Cloud documentation.

Gaze Link Wins Best Android App in Gemini API Developer Competition

Posted by Thomas Ezan – Sr Developer Relation Engineer (@lethargicpanda)

We're excited to announce Gaze Link as the winner of the Best Android App for our Gemini API Developer Competition!

This innovative app demonstrates the potential of the Gemini API in providing a communication system for individuals with Amyotrophic lateral sclerosis (ALS) who develop severe motor and verbal disabilities, enabling them to type sentences with only their eyes.

About Gaze Link

Gaze Link uses Google’s Gemini 1.5 Flash model to predict the user’s intended sentence based on a few key words and the context of the conversation.

For example if the context is “Is the room temperature ok?” and the user replies “hot AC two” the app will leverage Gemini to generate the full sentence “I am hot, can you turn the AC down by two degrees?”.

The Gaze Link team took advantage of Gemini 1.5 Flash multilingual capabilities to let the app generate sentences in English, Spanish and Chinese, the three languages currently supported by the app.

We were truly impressed by the Gaze Link app. The team used the Gemini API combined with ML Kit Face Detection to empower individuals with ALS providing them with a powerful communication system that is both accessible and affordable.

With Gemini 1.5 Flash currently supporting 38 languages, it is possible for Gaze Link to add support for more languages in the future. In addition, the model’s multimodal abilities could enable the team to enhance the user experience by integrating image, audio and video to augment the context of the conversation.

Build with the Gemini API

The result of the integration of the Gemini API in Gaze Link is inspiring. If you are working on an Android app today, we encourage you to learn about the Gemini API capabilities to see how you can successfully add generative AI to your app and delight your users.

To get started, go to the Android AI documentation!

Gemini API in action: showcase of innovative Android apps

Posted by Thomas Ezan, Sr Developer Relation Engineer

With the advent of Generative AI, Android developers now have access to capabilities that were previously out of reach. For instance, you can now easily add image captioning to your app without any computer vision knowledge.

With the upcoming launch of the stable version of VertexAI in Firebase in a few weeks (available in Beta Since Google I/O), you'll be able to effortlessly incorporate the capabilities of Gemini 1.5 Flash and Gemini 1.5 Pro into your app. The inference runs on Google's servers, making these capabilities accessible to any device with an internet connection.

Several Android developers have already begun leveraging this technology. Let's explore some of their creations.


Generate your meals for the week

The team behind Meal Planner, a meal planner and shopping list management app, is leveraging Gemini 1.5 Flash to create original meal plans. Based on the user’s diet, the number of people you are cooking for and any food allergies or intolerances, the app automatically creates a meal plan for the selected week.

For each dish, the model lists ingredients and quantities, taking into account the number of portions. It also provides instructions on how to prepare it. The app automatically generates a shopping list for the week based on the ingredient list for each meal.

moving image of Meal Planner app user experience

To enable reliable processing of the model’s response and to integrate it in the app, the team leveraged Gemini's JSON mode. They specified responseMimeType = "application/json" in the model configuration and defined the expected JSON response schema in the prompt (see API documentation).

Following the launch of the meal generation feature, Meal Planner received overwhelmingly positive feedback. The feature simplified meal planning for users with dietary restrictions and helped reduce food waste. In the months after its introduction, Meal Planner experienced a 17% surge in premium users.


Journaling while chatting with Leo

A few months ago, the team behind the journal app Life wanted to provide an innovative way to let their users log entries. They created "Leo", an AI diary assistant chatting with users and converting conversations into a journal entry.

To modify the behavior of the model and the tone of its responses, the team used system instructions to define the chatbot persona. This allows the user to set the behavior and tone of the assistant: Pick “Professional and formal” and the model will keep the conversation strict, select “Friendly and cheerful” and it will lighten up the dialogue with lots of emojis!

moving image of Leo app user experience

The team saw an increase of user engagement following the launch of the feature.

And if you want to know more about how the Life developers used Gemini API in their app, we had a great conversation with Jomin from the team. This conversation is part of a new Android podcast series called Android Build Time, that you can also watch on YouTube.


Create a nickname on the fly

The HiiKER app provides offline hiking maps. The app also fosters a community by letting users rating trails and leaving comments. But users signing up don’t always add a username to their profile. To avoid the risk of lowering the conversion rate by making username selection mandatory at signup time, the team decided to use the Gemini API to suggest unique usernames based on the user’s country or area.

moving image of HiiKER app user experience

To generate original usernames, the team set a high temperature value and played with the top-K and top-P values to increase the creativity of the model.

This AI-assisted feature led to a significant lift in the percentage of users with "complete" profiles contributing to a positive impact on engagement and retention.

It’s time to build!

Generative AI is still a very new space and we are just starting to have easy access to these capabilities for Android. From enabling advanced personalization, creating delightful interactive experiences or simplifying signup, you might have unique challenges that you are trying to solve as an Android app developer. It is a great time to start looking at these challenges as opportunities that generative AI can help you tackle!


You can learn more about the advanced features of the Gemini Cloud models, find an introduction to generative AI for Android developers, and get started with Vertex AI in Firebase documentation.

To learn more about AI on Android, check out other resources we have available during AI in Android Spotlight Week.

Use #AndroidAI hashtag to share your creations or feedback on social media, and join us at the forefront of the AI revolution!

Gemini API in action: showcase of innovative Android apps

Posted by Thomas Ezan, Sr Developer Relation Engineer

With the advent of Generative AI, Android developers now have access to capabilities that were previously out of reach. For instance, you can now easily add image captioning to your app without any computer vision knowledge.

With the upcoming launch of the stable version of VertexAI in Firebase in a few weeks (available in Beta Since Google I/O), you'll be able to effortlessly incorporate the capabilities of Gemini 1.5 Flash and Gemini 1.5 Pro into your app. The inference runs on Google's servers, making these capabilities accessible to any device with an internet connection.

Several Android developers have already begun leveraging this technology. Let's explore some of their creations.


Generate your meals for the week

The team behind Meal Planner, a meal planner and shopping list management app, is leveraging Gemini 1.5 Flash to create original meal plans. Based on the user’s diet, the number of people you are cooking for and any food allergies or intolerances, the app automatically creates a meal plan for the selected week.

For each dish, the model lists ingredients and quantities, taking into account the number of portions. It also provides instructions on how to prepare it. The app automatically generates a shopping list for the week based on the ingredient list for each meal.

moving image of Meal Planner app user experience

To enable reliable processing of the model’s response and to integrate it in the app, the team leveraged Gemini's JSON mode. They specified responseMimeType = "application/json" in the model configuration and defined the expected JSON response schema in the prompt (see API documentation).

Following the launch of the meal generation feature, Meal Planner received overwhelmingly positive feedback. The feature simplified meal planning for users with dietary restrictions and helped reduce food waste. In the months after its introduction, Meal Planner experienced a 17% surge in premium users.


Journaling while chatting with Leo

A few months ago, the team behind the journal app Life wanted to provide an innovative way to let their users log entries. They created "Leo", an AI diary assistant chatting with users and converting conversations into a journal entry.

To modify the behavior of the model and the tone of its responses, the team used system instructions to define the chatbot persona. This allows the user to set the behavior and tone of the assistant: Pick “Professional and formal” and the model will keep the conversation strict, select “Friendly and cheerful” and it will lighten up the dialogue with lots of emojis!

moving image of Leo app user experience

The team saw an increase of user engagement following the launch of the feature.

And if you want to know more about how the Life developers used Gemini API in their app, we had a great conversation with Jomin from the team. This conversation is part of a new Android podcast series called Android Build Time, that you can also watch on YouTube.


Create a nickname on the fly

The HiiKER app provides offline hiking maps. The app also fosters a community by letting users rating trails and leaving comments. But users signing up don’t always add a username to their profile. To avoid the risk of lowering the conversion rate by making username selection mandatory at signup time, the team decided to use the Gemini API to suggest unique usernames based on the user’s country or area.

moving image of HiiKER app user experience

To generate original usernames, the team set a high temperature value and played with the top-K and top-P values to increase the creativity of the model.

This AI-assisted feature led to a significant lift in the percentage of users with "complete" profiles contributing to a positive impact on engagement and retention.

It’s time to build!

Generative AI is still a very new space and we are just starting to have easy access to these capabilities for Android. From enabling advanced personalization, creating delightful interactive experiences or simplifying signup, you might have unique challenges that you are trying to solve as an Android app developer. It is a great time to start looking at these challenges as opportunities that generative AI can help you tackle!


You can learn more about the advanced features of the Gemini Cloud models, find an introduction to generative AI for Android developers, and get started with Vertex AI in Firebase documentation.

To learn more about AI on Android, check out other resources we have available during AI in Android Spotlight Week.

Use #AndroidAI hashtag to share your creations or feedback on social media, and join us at the forefront of the AI revolution!

Advanced capabilities of the Gemini API for Android developers

Posted by Thomas Ezan, Sr Developer Relation Engineer

Thousands of developers across the globe are harnessing the power of the Gemini 1.5 Pro and Gemini 1.5 Flash models to infuse advanced generative AI features into their applications. Android developers are no exception, and with the upcoming launch of the stable version of VertexAI in Firebase in a few weeks (available in Beta since Google I/O), it's the perfect time to explore how your app can benefit from it. We just published a codelab to help you get started.

Let's deep dive into some advanced capabilities of the Gemini API that go beyond simple text prompting and discover the exciting use cases they can unlock in your Android app.

Shaping AI behavior with system instructions

System instructions serve as a "preamble" that you incorporate before the user prompt. This enables shaping the model's behavior to align with your specific requirements and scenarios. You set the instructions when you initialize the model, and then those instructions persist through all interactions with the model, across multiple user and model turns.

For example, you can use system instructions to:

    • Define a persona or role for a chatbot (e.g, “explain like I am 5”)
    • Specify the response to the output format (e.g., Markdown, YAML, etc.)
    • Set the output style and tone (e.g, verbosity, formality, etc…)
    • Define the goals or rules for the task (e.g, “return a code snippet without further explanation”)
    • Provide additional context for the prompt (e.g., a knowledge cutoff date)

To use system instructions in your Android app, pass it as parameter when you initialize the model:

val generativeModel = Firebase.vertexAI.generativeModel(
  modelName = "gemini-1.5-flash",
  ...
  systemInstruction = 
    content { text("You are a knowledgeable tutor. Answer the questions using the socratic tutoring method.") }
)

You can learn more about system instruction in the Vertex AI in Firebase documentation.

You can also easily test your prompt with different system instructions in Vertex AI Studio, Google Cloud console tool for rapidly prototyping and testing prompts with Gemini models.


test system instructions with your prompts in Vertex AI Studio
Vertex AI Studio let’s you test system instructions with your prompts

When you are ready to go to production it is recommended to target a specific version of the model (e.g. gemini-1.5-flash-002). But as new model versions are released and previous ones are deprecated, it is advised to use Firebase Remote Config to be able to update the version of the Gemini model without releasing a new version of your app.

Beyond chatbots: leveraging generative AI for advanced use cases

While chatbots are a popular application of generative AI, the capabilities of the Gemini API go beyond conversational interfaces and you can integrate multimodal GenAI-enabled features into various aspects of your Android app.

Many tasks that previously required human intervention (such as analyzing text, image or video content, synthesizing data into a human readable format, engaging in a creative process to generate new content, etc… ) can be potentially automated using GenAI.

Gemini JSON support

Android apps don’t interface well with natural language outputs. Conversely, JSON is ubiquitous in Android development, and provides a more structured way for Android apps to consume input. However, ensuring proper key/value formatting when working with generative models can be challenging.

With the general availability of Vertex AI in Firebase, implemented solutions to streamline JSON generation with proper key/value formatting:

Response MIME type identifier

If you have tried generating JSON with a generative AI model, it's likely you have found yourself with unwanted extra text that makes the JSON parsing more challenging.

e.g:

Sure, here is your JSON:
```
{
   "someKey”: “someValue",
   ...
}
```

When using Gemini 1.5 Pro or Gemini 1.5 Flash, in the generation configuration, you can explicitly specify the model’s response mime/type as application/json and instruct the model to generate well-structured JSON output.

val generativeModel = Firebase.vertexAI.generativeModel(
  modelName = "gemini-1.5-flash",
  
  generationConfig = generationConfig {
     responseMimeType = "application/json"
  }
)

Review the API reference for more details.

Soon, the Android SDK for Vertex AI in Firebase will enable you to define the JSON schema expected in the response.


Multimodal capabilities

Both Gemini 1.5 Flash and Gemini 1.5 Pro are multimodal models. It means that they can process input from multiple formats, including text, images, audio, video. In addition, they both have long context windows, capable of handling up to 1 million tokens for Gemini 1.5 Flash and 2 million tokens for Gemini 1.5 Pro.

These features open doors to innovative functionalities that were previously inaccessible such as automatically generate descriptive captions for images, identify topics in a conversation and generate chapters from an audio file or describe the scenes and actions in a video file.

You can pass an image to the model as shown in this example:

val contentResolver = applicationContext.contentResolver
contentResolver.openInputStream(imageUri).use { stream ->
  stream?.let {
     val bitmap = BitmapFactory.decodeStream(stream)

    // Provide a prompt that includes the image specified above and text
    val prompt = content {
       image(bitmap)
       text("How many people are on this picture?")
    }
  }
  val response = generativeModel.generateContent(prompt)
}

You can also pass a video to the model:

val contentResolver = applicationContext.contentResolver
contentResolver.openInputStream(videoUri).use { stream ->
  stream?.let {
    val bytes = stream.readBytes()

    // Provide a prompt that includes the video specified above and text
    val prompt = content {
        blob("video/mp4", bytes)
        text("What is in the video?")
    }

    val fullResponse = generativeModel.generateContent(prompt)
  }
}

You can learn more about multimodal prompting in the VertexAI for Firebase documentation.

Note: This method enables you to pass files up to 20 MB. For larger files, use Cloud Storage for Firebase and include the file’s URL in your multimodal request. Read the documentation for more information.

Function calling: Extending the model's capabilities

Function calling enables you to extend the capabilities to generative models. For example you can enable the model to retrieve information in your SQL database and feed it back to the context of the prompt. You can also let the model trigger actions by calling the functions in your app source code. In essence, function calls bridge the gap between the Gemini models and your Kotlin code.

Take the example of a food delivery application that is interested in implementing a conversational interface with the Gemini 1.5 Flash. Assume that this application has a getFoodOrder(cuisine: String) function that returns the list orders from the user for a specific type of cuisine:

fun getFoodOrder(cuisine: String) : JSONObject {
        // implementation…  
}

Note that the function, to be usable to by the model, needs to return the response in the form of a JSONObject.

To make the response available to Gemini 1.5 Flash, create a definition of your function that the model will be able to understand using defineFunction:

val getOrderListFunction = defineFunction(
            name = "getOrderList",
            description = "Get the list of food orders from the user for a define type of cuisine.",
            Schema.str(name = "cuisineType", description = "the type of cuisine for the order")
        ) {  cuisineType ->
            getFoodOrder(cuisineType)
        }

Then, when you instantiate the model, share this function definition with the model using the tools parameter:

val generativeModel = Firebase.vertexAI.generativeModel(
    modelName = "gemini-1.5-flash",
    ...
    tools = listOf(Tool(listOf(getExchangeRate)))
)

Finally, when you get a response from the model, check in the response if the model is actually requesting to execute the function:

// Send the message to the generative model
var response = chat.sendMessage(prompt)

// Check if the model responded with a function call
response.functionCall?.let { functionCall ->
  // Try to retrieve the stored lambda from the model's tools and
  // throw an exception if the returned function was not declared
  val matchedFunction = generativeModel.tools?.flatMap { it.functionDeclarations }
      ?.first { it.name == functionCall.name }
      ?: throw InvalidStateException("Function not found: ${functionCall.name}")
  
  // Call the lambda retrieved above
  val apiResponse: JSONObject = matchedFunction.execute(functionCall)

  // Send the API response back to the generative model
  // so that it generates a text response that can be displayed to the user
  response = chat.sendMessage(
    content(role = "function") {
        part(FunctionResponsePart(functionCall.name, apiResponse))
    }
  )
}

// If the model responds with text, show it in the UI
response.text?.let { modelResponse ->
    println(modelResponse)
}

To summarize, you’ll provide the functions (or tools to the model) at initialization:

A flow diagram shows a green box labeled 'Generative Model' connected to a list of model parameters and a list of tools. The parameters include 'gemini-1.5-flash', 'api_key', and 'configuration', while the tools are 'getOrderList()', 'getDate()', and 'placeOrder()'.

And when appropriate, the model will request to execute the appropriate function and provide the results:

A flow diagram illustrating the interaction between an Android app and a 'Generative Model'. The app sends 'getDate()' and 'getOrderList()' requests.

You can read more about function calling in the VertexAI for Firebase documentation.

Unlocking the potential of the Gemini API in your app

The Gemini API offers a treasure trove of advanced features that empower Android developers to craft truly innovative and engaging applications. By going beyond basic text prompts and exploring the capabilities highlighted in this blog post, you can create AI-powered experiences that delight your users and set your app apart in the competitive Android landscape.

Read more about how some Android apps are already starting to leverage the Gemini API.


To learn more about AI on Android, check out other resources we have available during AI on Android Spotlight Week.

Use #AndroidAI hashtag to share your creations or feedback on social media, and join us at the forefront of the AI revolution!


The code snippets in this blog post have the following license:

// Copyright 2024 Google LLC.
// SPDX-License-Identifier: Apache-2.0

Advanced capabilities of the Gemini API for Android developers

Posted by Thomas Ezan, Sr Developer Relation Engineer

Thousands of developers across the globe are harnessing the power of the Gemini 1.5 Pro and Gemini 1.5 Flash models to infuse advanced generative AI features into their applications. Android developers are no exception, and with the upcoming launch of the stable version of VertexAI in Firebase in a few weeks (available in Beta since Google I/O), it's the perfect time to explore how your app can benefit from it. We just published a codelab to help you get started.

Let's deep dive into some advanced capabilities of the Gemini API that go beyond simple text prompting and discover the exciting use cases they can unlock in your Android app.

Shaping AI behavior with system instructions

System instructions serve as a "preamble" that you incorporate before the user prompt. This enables shaping the model's behavior to align with your specific requirements and scenarios. You set the instructions when you initialize the model, and then those instructions persist through all interactions with the model, across multiple user and model turns.

For example, you can use system instructions to:

    • Define a persona or role for a chatbot (e.g, “explain like I am 5”)
    • Specify the response to the output format (e.g., Markdown, YAML, etc.)
    • Set the output style and tone (e.g, verbosity, formality, etc…)
    • Define the goals or rules for the task (e.g, “return a code snippet without further explanation”)
    • Provide additional context for the prompt (e.g., a knowledge cutoff date)

To use system instructions in your Android app, pass it as parameter when you initialize the model:

val generativeModel = Firebase.vertexAI.generativeModel(
  modelName = "gemini-1.5-flash",
  ...
  systemInstruction = 
    content { text("You are a knowledgeable tutor. Answer the questions using the socratic tutoring method.") }
)

You can learn more about system instruction in the Vertex AI in Firebase documentation.

You can also easily test your prompt with different system instructions in Vertex AI Studio, Google Cloud console tool for rapidly prototyping and testing prompts with Gemini models.


test system instructions with your prompts in Vertex AI Studio
Vertex AI Studio let’s you test system instructions with your prompts

When you are ready to go to production it is recommended to target a specific version of the model (e.g. gemini-1.5-flash-002). But as new model versions are released and previous ones are deprecated, it is advised to use Firebase Remote Config to be able to update the version of the Gemini model without releasing a new version of your app.

Beyond chatbots: leveraging generative AI for advanced use cases

While chatbots are a popular application of generative AI, the capabilities of the Gemini API go beyond conversational interfaces and you can integrate multimodal GenAI-enabled features into various aspects of your Android app.

Many tasks that previously required human intervention (such as analyzing text, image or video content, synthesizing data into a human readable format, engaging in a creative process to generate new content, etc… ) can be potentially automated using GenAI.

Gemini JSON support

Android apps don’t interface well with natural language outputs. Conversely, JSON is ubiquitous in Android development, and provides a more structured way for Android apps to consume input. However, ensuring proper key/value formatting when working with generative models can be challenging.

With the general availability of Vertex AI in Firebase, implemented solutions to streamline JSON generation with proper key/value formatting:

Response MIME type identifier

If you have tried generating JSON with a generative AI model, it's likely you have found yourself with unwanted extra text that makes the JSON parsing more challenging.

e.g:

Sure, here is your JSON:
```
{
   "someKey”: “someValue",
   ...
}
```

When using Gemini 1.5 Pro or Gemini 1.5 Flash, in the generation configuration, you can explicitly specify the model’s response mime/type as application/json and instruct the model to generate well-structured JSON output.

val generativeModel = Firebase.vertexAI.generativeModel(
  modelName = "gemini-1.5-flash",
  
  generationConfig = generationConfig {
     responseMimeType = "application/json"
  }
)

Review the API reference for more details.

Soon, the Android SDK for Vertex AI in Firebase will enable you to define the JSON schema expected in the response.


Multimodal capabilities

Both Gemini 1.5 Flash and Gemini 1.5 Pro are multimodal models. It means that they can process input from multiple formats, including text, images, audio, video. In addition, they both have long context windows, capable of handling up to 1 million tokens for Gemini 1.5 Flash and 2 million tokens for Gemini 1.5 Pro.

These features open doors to innovative functionalities that were previously inaccessible such as automatically generate descriptive captions for images, identify topics in a conversation and generate chapters from an audio file or describe the scenes and actions in a video file.

You can pass an image to the model as shown in this example:

val contentResolver = applicationContext.contentResolver
contentResolver.openInputStream(imageUri).use { stream ->
  stream?.let {
     val bitmap = BitmapFactory.decodeStream(stream)

    // Provide a prompt that includes the image specified above and text
    val prompt = content {
       image(bitmap)
       text("How many people are on this picture?")
    }
  }
  val response = generativeModel.generateContent(prompt)
}

You can also pass a video to the model:

val contentResolver = applicationContext.contentResolver
contentResolver.openInputStream(videoUri).use { stream ->
  stream?.let {
    val bytes = stream.readBytes()

    // Provide a prompt that includes the video specified above and text
    val prompt = content {
        blob("video/mp4", bytes)
        text("What is in the video?")
    }

    val fullResponse = generativeModel.generateContent(prompt)
  }
}

You can learn more about multimodal prompting in the VertexAI for Firebase documentation.

Note: This method enables you to pass files up to 20 MB. For larger files, use Cloud Storage for Firebase and include the file’s URL in your multimodal request. Read the documentation for more information.

Function calling: Extending the model's capabilities

Function calling enables you to extend the capabilities to generative models. For example you can enable the model to retrieve information in your SQL database and feed it back to the context of the prompt. You can also let the model trigger actions by calling the functions in your app source code. In essence, function calls bridge the gap between the Gemini models and your Kotlin code.

Take the example of a food delivery application that is interested in implementing a conversational interface with the Gemini 1.5 Flash. Assume that this application has a getFoodOrder(cuisine: String) function that returns the list orders from the user for a specific type of cuisine:

fun getFoodOrder(cuisine: String) : JSONObject {
        // implementation…  
}

Note that the function, to be usable to by the model, needs to return the response in the form of a JSONObject.

To make the response available to Gemini 1.5 Flash, create a definition of your function that the model will be able to understand using defineFunction:

val getOrderListFunction = defineFunction(
            name = "getOrderList",
            description = "Get the list of food orders from the user for a define type of cuisine.",
            Schema.str(name = "cuisineType", description = "the type of cuisine for the order")
        ) {  cuisineType ->
            getFoodOrder(cuisineType)
        }

Then, when you instantiate the model, share this function definition with the model using the tools parameter:

val generativeModel = Firebase.vertexAI.generativeModel(
    modelName = "gemini-1.5-flash",
    ...
    tools = listOf(Tool(listOf(getExchangeRate)))
)

Finally, when you get a response from the model, check in the response if the model is actually requesting to execute the function:

// Send the message to the generative model
var response = chat.sendMessage(prompt)

// Check if the model responded with a function call
response.functionCall?.let { functionCall ->
  // Try to retrieve the stored lambda from the model's tools and
  // throw an exception if the returned function was not declared
  val matchedFunction = generativeModel.tools?.flatMap { it.functionDeclarations }
      ?.first { it.name == functionCall.name }
      ?: throw InvalidStateException("Function not found: ${functionCall.name}")
  
  // Call the lambda retrieved above
  val apiResponse: JSONObject = matchedFunction.execute(functionCall)

  // Send the API response back to the generative model
  // so that it generates a text response that can be displayed to the user
  response = chat.sendMessage(
    content(role = "function") {
        part(FunctionResponsePart(functionCall.name, apiResponse))
    }
  )
}

// If the model responds with text, show it in the UI
response.text?.let { modelResponse ->
    println(modelResponse)
}

To summarize, you’ll provide the functions (or tools to the model) at initialization:

A flow diagram shows a green box labeled 'Generative Model' connected to a list of model parameters and a list of tools. The parameters include 'gemini-1.5-flash', 'api_key', and 'configuration', while the tools are 'getOrderList()', 'getDate()', and 'placeOrder()'.

And when appropriate, the model will request to execute the appropriate function and provide the results:

A flow diagram illustrating the interaction between an Android app and a 'Generative Model'. The app sends 'getDate()' and 'getOrderList()' requests.

You can read more about function calling in the VertexAI for Firebase documentation.

Unlocking the potential of the Gemini API in your app

The Gemini API offers a treasure trove of advanced features that empower Android developers to craft truly innovative and engaging applications. By going beyond basic text prompts and exploring the capabilities highlighted in this blog post, you can create AI-powered experiences that delight your users and set your app apart in the competitive Android landscape.

Read more about how some Android apps are already starting to leverage the Gemini API.


To learn more about AI on Android, check out other resources we have available during AI on Android Spotlight Week.

Use #AndroidAI hashtag to share your creations or feedback on social media, and join us at the forefront of the AI revolution!


The code snippets in this blog post have the following license:

// Copyright 2024 Google LLC.
// SPDX-License-Identifier: Apache-2.0

PyTorch machine learning models on Android

Posted by Paul Ruiz – Senior Developer Relations Engineer

Earlier this year we launched Google AI Edge, a suite of tools with easy access to ready-to-use ML tasks, frameworks that enable you to build ML pipelines, and run popular LLMs and custom models – all on-device. For AI on Android Spotlight Week, the Google team is highlighting various ways that Android developers can use machine learning to help improve their applications.

In this post, we'll dive into Google AI Edge Torch, which enables you to convert PyTorch models to run locally on Android and other platforms, using the Google AI Edge LiteRT (formerly TensorFlow Lite) and MediaPipe Tasks libraries. For insights on other powerful tools, be sure to explore the rest of the AI on Android Spotlight Week content.

To get started with Google AI Edge easier, we've provided samples available on GitHub as an executable codelab. They demonstrate how to convert the MobileViT model for image classification (compatible with MediaPipe Tasks) and the DIS model for segmentation (compatible with LiteRT).

a red Android figurine is shown next to a black and white silhouette of the same figure, labeled 'Original Image' and 'PT Mask' respectively, demonstrating image segmentation.
DIS model output

This blog guides you through how to use the MobileViT model with MediaPipe Tasks. Keep in mind that the LiteRT runtime provides similar capabilities, enabling you to build custom pipelines and features.

Convert MobileViT model for image classification compatible with MediaPipe Tasks

Once you've installed the necessary dependencies and utilities for your app, the first step is to retrieve the PyTorch model you wish to convert, along with any other MobileViT components you might need (such as an image processor for testing).

from transformers import MobileViTImageProcessor, MobileViTForImageClassification

hf_model_path = 'apple/mobilevit-small'
processor = MobileViTImageProcessor.from_pretrained(hf_model_path)
pt_model = MobileViTForImageClassification.from_pretrained(hf_model_path)

Since the end result of this tutorial should work with MediaPipe Tasks, take an extra step to match the expected input and output shapes for image classification to what is used by the MediaPipe image classification Task.

class HF2MP_ImageClassificationModelWrapper(nn.Module):

  def __init__(self, hf_image_classification_model, hf_processor):
    super().__init__()
    self.model = hf_image_classification_model
    if hf_processor.do_rescale:
      self.rescale_factor = hf_processor.rescale_factor
    else:
      self.rescale_factor = 1.0

  def forward(self, image: torch.Tensor):
    # BHWC -> BCHW.
    image = image.permute(0, 3, 1, 2)
    # RGB -> BGR.
    image = image.flip(dims=(1,))
    # Scale [0, 255] -> [0, 1].
    image = image * self.rescale_factor
    logits = self.model(pixel_values=image).logits  # [B, 1000] float32.
    # Softmax is required for MediaPipe classification model.
    logits = torch.nn.functional.softmax(logits, dim=-1)

    return logits

hf_model_path = 'apple/mobilevit-small'
hf_mobile_vit_processor = MobileViTImageProcessor.from_pretrained(hf_model_path)
hf_mobile_vit_model = MobileViTForImageClassification.from_pretrained(hf_model_path)
wrapped_pt_model = HF2MP_ImageClassificationModelWrapper(
hf_mobile_vit_model, hf_mobile_vit_processor).eval()

Whether you plan to use the converted MobileViT model with MediaPipe Tasks or LiteRT, the next step is to convert the model to the .tflite format.

First, match the input shape. In this example, the input shape is 1, 256, 256, 3 for a 256x256 pixel three-channel RGB image.

Then, call AI Edge Torch's convert function to complete the conversion process.

import ai_edge_torch

sample_args = (torch.rand((1, 256, 256, 3)),)
edge_model = ai_edge_torch.convert(wrapped_pt_model, sample_args)

After converting the model, you can further refine it by incorporating metadata for the image classification labels. MediaPipe Tasks will utilize this metadata to display or return pertinent information after classification.

from mediapipe.tasks.python.metadata.metadata_writers import image_classifier
from mediapipe.tasks.python.metadata.metadata_writers import metadata_writer
from mediapipe.tasks.python.vision.image_classifier import ImageClassifier
from pathlib import Path

flatbuffer_file = Path('hf_mobile_vit_mp_image_classification_raw.tflite')
edge_model.export(flatbuffer_file)
tflite_model_buffer = flatbuffer_file.read_bytes()

//Extract the image classification labels from the HF models for later integration into the TFLite model.
labels = list(hf_mobile_vit_model.config.id2label.values())

writer = image_classifier.MetadataWriter.create(
    tflite_model_buffer,
    input_norm_mean=[0.0], #  Normalization is not needed for this model.
    input_norm_std=[1.0],
    labels=metadata_writer.Labels().add(labels),
)
tflite_model_buffer, _ = writer.populate()

With all of that completed, it's time to integrate your model into an Android app. If you're following the official Colab notebook, this involves saving the model locally. For an example of image classification with MediaPipe Tasks, explore the GitHub repository. You can find more information in the official Google AI Edge documentation.

moving image of Newly converted ViT model with MediaPipe Tasks
Newly converted ViT model with MediaPipe Tasks

After understanding how to convert a simple image classification model, you can use the same techniques to adapt various PyTorch models for Google AI Edge LiteRT or MediaPipe Tasks tooling on Android.

For further model optimization, consider methods like quantizing during conversion. Check out the GitHub example to learn more about how to convert a PyTorch image segmentation model to LiteRT and quantize it.

What's Next

To keep up to date on Google AI Edge developments, look for announcements on the Google for Developers YouTube channel and blog.

We look forward to hearing about how you're using these features in your projects. Use #AndroidAI hashtag to share your feedback or what you've built in social media and check out other content in AI on Android Spotlight Week!

How to bring your AI Model to Android devices

Posted by Kateryna Semenova – Senior Developer Relations Engineer and Mark Sherwood – Senior Product Manager

During AI on Android Spotlight Week, we're diving into how you can bring your own AI model to Android-powered devices such as phones, tablets, and beyond. By leveraging the tools and technologies available from Google and other sources, you can run sophisticated AI models directly on these devices, opening up exciting possibilities for better performance, privacy, and usability.

Understanding on-device AI

On-device AI involves deploying and executing machine learning or generative AI models directly on hardware devices, instead of relying on cloud-based servers. This approach offers several advantages, such as reduced latency, enhanced privacy, cost saving and less dependence on internet connectivity.

For generative text use cases, explore Gemini Nano that is now available in experimental access through its SDK. For many on-device AI use cases, you might want to package your own models in your app. Today we will walk through how to do so on Android.

Key resources for on-device AI

The Google AI Edge platform provides a comprehensive ecosystem for building and deploying AI models on edge devices. It supports various frameworks and tools, enabling developers to integrate AI capabilities seamlessly into their applications. The Google AI Edge platforms consists of:

    • MediaPipe Tasks - Cross-platform low-code APIs to tackle common generative AI, vision, text, and audio tasks
    • LiteRT (formerly known as TensorFlow Lite) - Lightweight runtime for deploying custom machine learning models on Android
    • MediaPipe Framework - Pipeline framework for chaining multiple ML models along with pre and post processing logic


Google AI Edge Logo

How to build custom AI features on Android

    1. Define your use case: Before diving into technical details, it's crucial to clearly define what you want your AI feature to achieve. Whether you're aiming for image classification, natural language processing, or another application, having a well-defined goal will guide your development process.

    2. Choose the right tools and frameworks: Depending on your use case, you might be able to use an out of the box solution or you might need to create or source your own model. Look through MediaPipe Tasks for common solutions such as gesture recognition, image segmentation or face landmark detection. If you find a solution that aligns with your needs, you can proceed directly to the testing and deployment step.


Google AI Edge Logo

    If you need to create or source a custom model for your use case, you will need an on-device ML framework such as LiteRT (formerly TensorFlow Lite). LiteRT is designed specifically for mobile and edge devices and provides a lightweight runtime for deploying machine learning models. Simply follow these substeps:

        a. Develop and train your model: Develop your AI model using your chosen framework. Training can be performed on a powerful machine or cloud environment, but the model should be optimized for deployment on a device. Techniques like quantization and pruning can help reduce the model size and improve inference speed. Model Explorer can help understand and explore your model as you're working with it.

        b. Convert and optimize the model: Once your model is trained, convert it to a format suitable for on-device deployment. LiteRT, for example, requires conversion to its specific format. Optimization tools can help reduce the model’s footprint and enhance performance. AI Edge Torch allows you to convert PyTorch models to run locally on Android and other platforms, using Google AI Edge LiteRT and MediaPipe Tasks libraries.

        c. Accelerate your model: You can speed up model inference on Android by using GPU and NPU. LiteRT’s GPU delegate allows you to run your model on GPU today. We’re working hard on building the next generation of GPU and NPU delegates that will make your models run even faster, and enable more models to run on GPU and NPU. We’d like to invite you to participate in our early access program to try out this new GPU and NPU infrastructure. We will select participants out on a rolling basis so don’t wait to reach out.

    3. Test and deploy: To ensure that your model delivers the expected performance across various devices, rigorous testing is crucial. Deploy your app to users after completing the testing phase, offering them a seamless and efficient AI experience. We're working on bringing the benefits of Google Play and Android App Bundles to delivering custom ML models for on-device AI features. Play for On-device AI takes the complexity out of launching, targeting, versioning, downloading, and updating on-device models so that you can offer your users a better user experience without compromising your app's size and at no additional cost. Complete this form to express interest in joining the Play for On-device AI early access program.

Build trust in AI through privacy and transparency

With the growing role of AI in everyday life, ensuring models run as intended on devices is crucial. We're emphasizing a "zero trust" approach, providing developers with tools to verify device integrity and user control over their data. In the zero trust approach, developers need the ability to make informed decisions about the device's trustworthiness.

The Play Integrity API is recommended for developers looking to verify their app, server requests, and the device environment (and, soon, the recency of security updates on the device). You can call the API at important moments before your app’s backend decides to download and run your models. You can also consider turning on integrity checks for installing your app to reduce your app’s distribution to unknown and untrusted environments.

Play Integrity API makes use of Android Platform Key Attestation to verify hardware components and generate integrity verdicts across the fleet, eliminating the need for most developers to directly integrate different attestation tools and reducing device ecosystem complexity. Developers can use one or both of these tools to assess device security and software integrity before deciding whether to trust a device to run AI models.

Conclusion

Bringing your own AI model to a device involves several steps, from defining your use case to deploying and testing the model. With resources like Google AI Edge, developers have access to powerful tools and insights to make this process smoother and more effective. As on-device AI continues to evolve, leveraging these resources will enable you to create cutting-edge applications that offer enhanced performance, privacy, and user experience. We are currently seeking early access partners to try out some of our latest tools and APIs at Google AI Edge. Simply fill in this form to connect and explore how we can work together to make your vision a reality.

Dive into these resources and start exploring the potential of on-device AI—your next big innovation could be just a model away!

Use #AndroidAI hashtag to share your feedback or what you've built on social media and catch up with the rest of the updates being shared during Spotlight Week: AI on Android.

An introduction to privacy and safety for Gemini Nano

Posted by Terence Zhang – Developer Relations Engineer, and Adrien Couque – Software Engineer

AI can enhance the user experience and productivity of Android apps. If you're looking to build GenAI features that benefit from additional data privacy or offline inference, on-device GenAI is a good choice as it processes prompts directly on your device without any server calls.

Gemini Nano is the most efficient model in Google's Gemini family, and Android’s foundational model for running on-device GenAI. It's supported by AICore, a system service that works behind the scenes to centralize the model’s runtime, ensure its safe execution, and protect your privacy. With Gemini Nano, apps can offer more personalized and reliable AI experiences without sending your data off the device.

In this blog post, we'll provide an introductory look into how Gemini Nano and AICore work together to deliver powerful on-device AI capabilities while prioritizing users’ privacy and safety.

Private Compute Core (PCC) compliance

At Google I/O 2021, we introduced Private Compute Core (PCC), a secure environment designed to keep your data private. At I/O in 2024, we shared that AICore is PCC compliant, meaning that it operates under strict privacy rules. It can only interact with a limited set of other system packages that are also PCC compliant, and it cannot directly access the internet. Any requests to download models or other information are routed through a separate, open-source companion APK called Private Compute Services.

This framework helps protect your privacy while still allowing apps to benefit from the power of Gemini Nano. Consider a keyboard application using Gemini Nano for a reply suggestion feature. Without PCC, the keyboard would require direct access to the conversation context. With PCC, the code that has access to the conversation runs in a secure sandbox and interacts directly with Gemini Nano to generate suggestions on behalf of the keyboard. This allows the keyboard app to benefit from Gemini Nano's capabilities without directly accessing or storing sensitive conversation data. You can find out more about how this works in the PCC Whitepaper.

Protecting your privacy through data isolation

AICore is built to isolate each request to protect your privacy. This prevents apps from accessing data that does not belong to them. Requests are handled independently and processed from a single app at a time to mitigate the risk of data being exposed to other apps.

Additionally, AICore doesn't store any record of the input data or the resulting outputs after processing each request. This design, combined with the fact that Gemini Nano’s inference happens directly on your device, helps ensure your app’s data stays private and secure.

Prioritizing Safety in Gemini Nano

A flow chart illustrating the architecture of an AI system, highlighting the flow of data and processing steps from the 'Client app' to the 'Service' component, including 'Input safety signals', 'Output safety signals', 'Weights' and 'Runtime'

We're committed to building AI responsibly, and that includes making sure Gemini Nano is safe. We've implemented multiple layers of protection to limit harmful or unintended results:

    • Native model safety: All Gemini models, including Gemini Nano, are trained to be safety-aware out of the box. This means safety considerations are built into the core of the model, not just added as an afterthought.
    • Safety aware fine-tuning: We use a LoRA fine-tuning block to adapt Gemini Nano for the needs of specific apps. When we train the LoRA block, we incorporate safety data specific to the app’s use case to preserve and even enhance the model's safety features during fine-tuning where applicable.
    • Safety filters on input and output: As a final safeguard, both the input prompt and results generated by the Gemini Nano runtime are evaluated against our safety filters before providing the results to the app. This helps prevent unsafe content from slipping through, without any loss in quality.

These layers of protection work together to ensure that Gemini Nano provides a safe and helpful experience for everyone.


Get started

Learn more about Gemini Nano for app development, and try it out in your own app!

Be sure to check out the other amazing AI on Android Spotlight week content!