Tag Archives: developer tools

CameraX update makes dual concurrent camera even easier

Posted by Donovan McMurray – Developer Relations Engineer

CameraX, Android's Jetpack camera library, is getting an exciting update to its Dual Concurrent Camera feature, making it even easier to integrate this feature into your app. This feature allows you to stream from 2 different cameras at the same time. The original version of Dual Concurrent Camera was released in CameraX 1.3.0, and it was already a huge leap in making this feature easier to implement.

Starting with 1.5.0-alpha01, CameraX will now handle the composition of the 2 camera streams as well. This update is additional functionality, and it doesn’t remove any prior functionality nor is it a breaking change to your existing Dual Concurrent Camera code. To tell CameraX to handle the composition, simply use the new SingleCameraConfig constructor which has a new parameter for a CompositionSettings object. Since you’ll be creating 2 SingleCameraConfigs, you should be consistent with what constructor you use.

Nothing has changed in the way you check for concurrent camera support from the prior version of this feature. As a reminder, here is what that code looks like.

// Set up primary and secondary camera selectors if supported on device.
var primaryCameraSelector: CameraSelector? = null
var secondaryCameraSelector: CameraSelector? = null

for (cameraInfos in cameraProvider.availableConcurrentCameraInfos) {
    primaryCameraSelector = cameraInfos.first {
        it.lensFacing == CameraSelector.LENS_FACING_FRONT
    }.cameraSelector
    secondaryCameraSelector = cameraInfos.first {
        it.lensFacing == CameraSelector.LENS_FACING_BACK
    }.cameraSelector

    if (primaryCameraSelector == null || secondaryCameraSelector == null) {
        // If either a primary or secondary selector wasn't found, reset both
        // to move on to the next list of CameraInfos.
        primaryCameraSelector = null
        secondaryCameraSelector = null
    } else {
        // If both primary and secondary camera selectors were found, we can
        // conclude the search.
        break
    }
}

if (primaryCameraSelector == null || secondaryCameraSelector == null) {
    // Front and back concurrent camera not available. Handle accordingly.
}

Here’s the updated code snippet showing how to implement picture-in-picture, with the front camera stream scaled down to fit into the lower right corner. In this example, CameraX handles the composition of the camera streams.

// If 2 concurrent camera selectors were found, create 2 SingleCameraConfigs
// and compose them in a picture-in-picture layout.
val primary = SingleCameraConfig(
    cameraSelectorPrimary,
    useCaseGroup,
    CompositionSettings.Builder()
        .setAlpha(1.0f)
        .setOffset(0.0f, 0.0f)
        .setScale(1.0f, 1.0f)
        .build(),
    lifecycleOwner);
val secondary = SingleCameraConfig(
    cameraSelectorSecondary,
    useCaseGroup,
    CompositionSettings.Builder()
        .setAlpha(1.0f)
        .setOffset(2 / 3f - 0.1f, -2 / 3f + 0.1f)
        .setScale(1 / 3f, 1 / 3f)
        .build()
    lifecycleOwner);

// Bind to lifecycle
ConcurrentCamera concurrentCamera =
    cameraProvider.bindToLifecycle(listOf(primary, secondary));

You are not constrained to a picture-in-picture layout. For instance, you could define a side-by-side layout by setting the offsets and scaling factors accordingly. You want to keep both dimensions scaled by the same amount to avoid a stretched preview. Here’s how that might look.

// If 2 concurrent camera selectors were found, create 2 SingleCameraConfigs
// and compose them in a picture-in-picture layout.
val primary = SingleCameraConfig(
    cameraSelectorPrimary,
    useCaseGroup,
    CompositionSettings.Builder()
        .setAlpha(1.0f)
        .setOffset(0.0f, 0.25f)
        .setScale(0.5f, 0.5f)
        .build(),
    lifecycleOwner);
val secondary = SingleCameraConfig(
    cameraSelectorSecondary,
    useCaseGroup,
    CompositionSettings.Builder()
        .setAlpha(1.0f)
        .setOffset(0.5f, 0.25f)
        .setScale(0.5f, 0.5f)
        .build()
    lifecycleOwner);

// Bind to lifecycle
ConcurrentCamera concurrentCamera =
    cameraProvider.bindToLifecycle(listOf(primary, secondary));

We’re excited to offer this improvement to an already developer-friendly feature. Truly the CameraX way! CompositionSettings in Dual Concurrent Camera is currently in alpha, so if you have feature requests to improve upon it before the API is locked in, please give us feedback in the CameraX Discussion Group. And check out the full CameraX 1.5.0-alpha01 release notes to see what else is new in CameraX.

Introducing Ink API, a new Jetpack library for stylus apps

Posted by Chris Assigbe – Developer Relations Engineer and Tom Buckley – Product Manager

With stylus input, Android apps on phones, foldables, tablets, and Chromebooks become even more powerful tools for productivity and creativity. While there's already a lot to think about when designing for large screens – see our full guidance and inspiration gallery – styluses are especially impactful, transforming these devices into a digital notebook or sketchbook. Users expect stylus experiences to feel as fluid and natural as writing on paper, which is why Android previously added APIs to reduce inking latency to as low as 4ms; virtually imperceptible. However, latency is just one aspect of an inking experience – developers currently need to generate stroke shapes from stylus input, render those strokes quickly, and efficiently run geometric queries over strokes for tools like selection and eraser. These capabilities can require significant investment in geometry and graphics just to get started.

Today, we're excited to share Ink API, an alpha Jetpack library that makes it easy to create, render, and manipulate beautiful ink strokes, enabling developers to build amazing features on top of these APIs. Ink API builds upon the Android framework's foundation of low latency and prediction, providing you with a powerful and intuitive toolkit for integrating rich inking features into your apps.

moving image of a stylus writing with Ink API on a Samsung Tab S8, 4ms showing end-to-end latency
Writing with Ink API on a Samsung Tab S8, 4ms end-to-end latency

What is Ink API?

Ink API is a comprehensive stylus input library that empowers you to quickly create innovative and expressive inking experiences. It offers a modular architecture rather than a one-size-fits-all canvas, so you can tailor Ink API to your app's stack and needs. The modules encompass key functionalities like:

    • Strokes module: Represents the ink input and its visual representation.
    • Geometry module: Supports manipulating and analyzing strokes, facilitating features like erasing, and selecting strokes.
    • Brush module: Provides a declarative way to define the visual style of strokes, including color, size, and the type of tool to draw with.
    • Rendering module: Efficiently displays ink strokes on the screen, allowing them to be combined with Jetpack Compose or Android Views.
    • Live Authoring module: Handles real-time inking input to create smooth strokes with the lowest latency a device can provide.

Ink API is compatible with devices running Android 5.0 (API level 21) or later, and offers benefits on all of these devices. It can also take advantage of latency improvements in Android 10 (API 29) and improved rendering effects and performance in Android 14 (API 34).

Why choose Ink API?

Ink API provides an out-of-the-box implementation for basic inking tasks so you can create a unique drawing experience for your own app. Ink API offers several advantages over a fully custom implementation:

    • Ease of Use: Ink API abstracts away the complexities of graphics and geometry, allowing you to focus on your app's unique inking features.
    • Performance: Built-in low latency support and optimized rendering ensure a smooth and responsive inking experience.
    • Flexibility: The modular design allows you to pick and choose the components you need, tailoring the library to your specific requirements.

Ink API has already been adopted across many Google apps because of these advantages, including for markup in Docs and Circle-to-Search; and the underlying technology also powers markup in Photos, Drive, Meet, Keep, and Classroom. For Circle to Search, the Ink API modular design empowered the team to utilize only the components they needed. They leveraged the live authoring and brush capabilities of Ink API to render a beautiful stroke as users circle (to search). The team also built custom geometry tools tailored to their ML models. That’s modularity at its finest.

moving image of a stylus writing with Ink API on a Samsung Tab S8, 4ms showing end-to-end latency

“Ink API was our first choice for Circle-to-Search (CtS). Utilizing their extensive documentation, integrating the Ink API was a breeze, allowing us to reach our first working prototype w/in just one week. Ink's custom brush texture and animation support allowed us to quickly iterate on the stroke design.” 

- Jordan Komoda, Software Engineer, Google

We have also designed Ink API with our Android app partners' feedback in mind to make sure it fits with their existing app architectures and requirements.

With Ink API, building a natural and fluid inking experience on Android is simpler than ever. Ink API lets you focus on what differentiates your experience rather than on the details of paths, meshes, and shaders. Whether you are exploring inking for note-taking, photo or document markup, interactive learning, or something completely different, we hope you’ll give Ink API a try!

Get started with Ink API

Ready to dive into the well of Ink API? Check out the official developer guide and explore the API reference to start building your next-generation inking app. We're eager to see the innovative experiences you create!

Note: This alpha release is just the beginning for Ink API. We're committed to continuously improving the library, adding new features and functionalities based on your feedback. Stay tuned for updates and join us in shaping the future of inking on Android!

Introducing the Fused Orientation Provider API: Consistent device orientation for all

Posted by Geoffrey Boullanger – Senior Software Engineer, Shandor Dektor – Sensors Algorithms Engineer, Martin Frassl and Benjamin Joseph – Technical Leads and Managers

Device orientation, or attitude, is used as an input signal for many use cases: virtual or augmented reality, gesture detection, or compass and navigation – any time the app needs the orientation of a device in relation to its surroundings. We’ve heard from developers that orientation is challenging to get right, with frequent user complaints when orientation is incorrect. A maps app should show the correct direction to walk towards when a user is navigating to an exciting restaurant in a foreign city!

The Fused Orientation Provider (FOP) is a new API in Google Play services that provides quality and consistent device orientation by fusing signals from accelerometer, gyroscope and magnetometer.

Although currently the Android Rotation Vector already provides device orientation (and will continue to do so), the new FOP provides more consistent behavior and high performance across devices. We designed the FOP API to be similar to the Rotation Vector to make the transition as easy as possible for developers.

In particular, the Fused Orientation Provider

    • Provides a unified implementation across devices: an API in Google Play services means that there is no implementation variance across different manufacturers. Algorithm updates can be rolled out quickly and independent of Android platform updates;
    • Directly incorporates local magnetic declination, if available;
    • Compensates for lower quality sensors and OEM implementations (e.g., gyro bias, sensor timing).

In certain cases, the FOP returns values piped through from the AOSP Rotation Vector, adapted to incorporate magnetic declination.

How to use the FOP API

Device orientation updates can be requested by creating and sending a DeviceOrientationRequest object, which defines some specifics of the request like the update period.

The FOP then outputs a stream of the device’s orientation estimates as quaternions. The orientation is referenced to geographic north. In cases where the local magnetic declination is not known (e.g., location is not available), the orientation will be relative to magnetic north.

In addition, the FOP provides the device’s heading and accuracy, which are derived from the orientation estimate. This is the same heading that is shown in Google Maps, which uses the FOP as well. We recently added changes to better cope with magnetic disturbances, to improve the reliability of the cone for Google Maps and FOP clients.

The update rate can be set by requesting a specific update period. The FOP does not guarantee a minimum or maximum update rate. For example, the update rate can be faster than requested if another app has a faster parallel request, or it can be slower as requested if the device doesn’t support the high rate.

For full specification of the API, please consult the API documentation:

Example usage (Kotlin)

package ...

import android.content.Context
import com.google.android.gms.location.DeviceOrientation
import com.google.android.gms.location.DeviceOrientationListener
import com.google.android.gms.location.DeviceOrientationRequest
import com.google.android.gms.location.FusedOrientationProviderClient
import com.google.android.gms.location.LocationServices
import com.google.common.flogger.FluentLogger
import java.util.concurrent.Executors

class Example(context: Context) {
  private val logger: FluentLogger = FluentLogger.forEnclosingClass()

  // Get the FOP API client
  private val fusedOrientationProviderClient: FusedOrientationProviderClient =
    LocationServices.getFusedOrientationProviderClient(context)

  // Create an FOP listener
  private val listener: DeviceOrientationListener =
    DeviceOrientationListener { orientation: DeviceOrientation ->
      // Use the orientation object returned by the FOP, e.g.
      logger.atFinest().log("Device Orientation: %s deg", orientation.headingDegrees)
    }

  fun start() {
    // Create an FOP request
    val request =
      DeviceOrientationRequest.Builder(DeviceOrientationRequest.OUTPUT_PERIOD_DEFAULT).build()

    // Create (or re-use) an Executor or Looper, e.g.
    val executor = Executors.newSingleThreadExecutor()

    // Register the request and listener
    fusedOrientationProviderClient
      .requestOrientationUpdates(request, executor, listener)
      .addOnSuccessListener { logger.atInfo().log("FOP: Registration Success") }
      .addOnFailureListener { e: Exception? ->
        logger.atSevere().withCause(e).log("FOP: Registration Failure")
      }
  }

  fun stop() {
    // Unregister the listener
    fusedOrientationProviderClient.removeOrientationUpdates(listener)
  }
}

Technical background

The Android ecosystem has a wide variety of system implementations for sensors. Devices should meet the criteria in the Android compatibility definition document (CDD) and must have an accelerometer, gyroscope, and magnetometer available to use the fused orientation provider. It is preferable that the device vendor implements the high fidelity sensor portion of the CDD.

Even though Android devices adhere to the Android CDD, recommended sensor specifications are not tight enough to fully prevent orientation inaccuracies. Examples of this include magnetometer interference from internal sources, and delayed, inaccurate or nonuniform sensor sampling. Furthermore, the environment around the device usually includes materials that distort the geomagnetic field, and user behavior can vary widely. To deal with this, the FOP performs a number of tasks in order to provide a robust and accurate orientation:

    • Synchronize sensors running on different clocks and delays;
    • Compensate for the hard iron offset (magnetometer bias);
    • Fuse accelerometer, gyroscope, and magnetometer measurements to determine the orientation of the device in the world;
    • Compensate for gyro drift (gyro bias) while moving;
    • Produce a realistic estimate of the compass heading accuracy.

We have validated our algorithms on comprehensive test data to provide a high quality result on a wide variety of devices.

Availability and limitations

The Fused Orientation Provider is available on all devices running Google Play services on Android 5 (Lollipop) and above. Developers need to add the dependency play-services-location:21.2.0 (or above) to access the new API.

Permissions

No permissions are required to use the FOP API. The output rate is limited to 200Hz on devices running API level 31 (Android S) or higher, unless the android.permissions.HIGH_SAMPLING_RATE_SENSORS permission was added to your Manifest.xml.

Power consideration

Always request the longest update period (lowest frequency) that is sufficient for your use case. While more frequent FOP updates can be required for high precision tasks (for example Augmented Reality), it comes with a power cost. If you do not know which update period to use, we recommend starting with DeviceOrientationRequest::OUTPUT_PERIOD_DEFAULT as it fits most client needs.

Foreground behavior

FOP updates are only available to apps running in the foreground.


Copyright 2023 Google LLC.
SPDX-License-Identifier: Apache-2.0

Easily add document scanning capability to your app with ML Kit Document Scanner API

Posted by Thomas Ezan – Sr. Developer Relations Engineer; Chengji Yan, Penny Li – ML Kit Engineers; David Miro Llopis – Product Manager

We are excited to announce the launch of the ML Kit Document Scanner API. This new API makes it easy to add advanced document scanning capabilities with a high-quality and consistent user interface to your Android app. The ML Kit Document Scanner API enables your users to quickly and easily digitize paper documents.

Like the other ML Kit APIs, the ML Kit Document Scanner API enables you to seamlessly integrate features powered by Machine Learning (ML) without any ML knowledge.

ml kit document scanner illustration

Why Document Scanner SDK?

Despite the digital revolution, paper documents and printouts are still present in our everyday life. Some of our most important documents are still physical (identity documents, receipts, etc.).

The ML Kit Document Scanner API offers a number of benefits, including:

    • A high-quality and consistent user interface for digitizing physical documents.
    • Accurate document detection with precise corner and edge detection for a seamless scanning experience and optimal scanning results.
    • Flexible functionality allows users to crop scanned documents, apply filters, remove fingers, remove stains and other blemishes and send digitized files in PDF and JPEG formats back to your app.
    • On-device processing helps preserve privacy.
    • A complete solution eliminating the need for camera permission.

The ML Kit Document Scanner API is already used by Google Drive Android application and the Google Pixel Camera.

moving image showing ML Kit Document scanner API in action in  
Google Drive
ML Kit Document scanner API in action in Google Drive

Get started

The ML Kit Document Scanner API requires Android API level 21 or above. The models, scanning logic, and UI flow are dynamically downloaded via Google Play services so the ML Kit Document Scanner API has a minimal impact on your app size.

To integrate it in your app, start by configuring the scanner options and getting a scanner client:

val options = GmsDocumentScannerOptions.Builder()
    .setGalleryImportAllowed(false)
    .setPageLimit(2)
    .setResultFormats(RESULT_FORMAT_JPEG, RESULT_FORMAT_PDF)
    .setScannerMode(SCANNER_MODE_FULL)
    .build()
val scanner = GmsDocumentScanning.getClient(options)

Then register an ActivityResultCallback to receive the scanning results:

val scannerLauncher = registerForActivityResult(StartIntentSenderForResult()) {
  result -> {
    if (result.resultCode == RESULT_OK) {
      val result =
        GmsDocumentScanningResult.fromActivityResultIntent(result.data)
      result.getPages()?.let { pages ->
        for (page in pages) {
          val imageUri = page.getImageUri()
        }
      }
      result.getPdf()?.let { pdf ->
        val pdfUri = pdf.getUri()
        val pageCount = pdf.getPageCount()
      }
    }
  }
}

Finally launch the document scanner activity:

scanner.getStartScanIntent(activity)
  .addOnSuccessListener { intentSender ->   
    scannescannerrLauncher.launch(IntentSenderRequest.Builder(intentSender).build())
  }
  .addOnFailureListener { ... }

To get started with the ML Kit Document Scanner API, visit the documentation. We can’t wait to see what you’ll build with it!

Using Generative AI for Travel Inspiration and Discovery

Posted by Yiling Liu, Product Manager, Google Partner Innovation

Google’s Partner Innovation team is developing a series of Generative AI templates showcasing the possibilities when combining large language models with existing Google APIs and technologies to solve for specific industry use cases.

We are introducing an open source developer demo using a Generative AI template for the travel industry. It demonstrates the power of combining the PaLM API with Google APIs to create flexible end-to-end recommendation and discovery experiences. Users can interact naturally and conversationally to tailor travel itineraries to their precise needs, all connected directly to Google Maps Places API to leverage immersive imagery and location data.

An image that overviews the Travel Planner experience. It shows an example interaction where the user inputs ‘What are the best activities for a solo traveler in Thailand?’. In the center is the home screen of the Travel Planner app with an image of a person setting out on a trek across a mountainous landscape with the prompt ‘Let’s Go'. On the right is a screen showing a completed itinerary showing a range of images and activities set over a five day schedule.

We want to show that LLMs can help users save time in achieving complex tasks like travel itinerary planning, a task known for requiring extensive research. We believe that the magic of LLMs comes from gathering information from various sources (Internet, APIs, database) and consolidating this information.

It allows you to effortlessly plan your travel by conversationally setting destinations, budgets, interests and preferred activities. Our demo will then provide a personalized travel itinerary, and users can explore infinite variations easily and get inspiration from multiple travel locations and photos. Everything is as seamless and fun as talking to a well-traveled friend!

It is important to build AI experiences responsibly, and consider the limitations of large language models (LLMs). LLMs are a promising technology, but they are not perfect. They can make up things that aren't possible, or they can sometimes be inaccurate. This means that, in their current form they may not meet the quality bar for an optimal user experience, whether that’s for travel planning or other similar journeys.

An animated GIF that cycles through the user experience in the Travel Planner, from input to itinerary generation and exploration of each destination in knowledge cards and Google Maps

Open Source and Developer Support

Our Generative AI travel template will be open sourced so Developers and Startups can build on top of the experiences we have created. Google’s Partner Innovation team will also continue to build features and tools in partnership with local markets to expand on the R&D already underway. We’re excited to see what everyone makes! View the project on GitHub here.


Implementation

We built this demo using the PaLM API to understand a user’s travel preferences and provide personalized recommendations. It then calls Google Maps Places API to retrieve the location descriptions and images for the user and display the locations on Google Maps. The tool can be integrated with partner data such as booking APIs to close the loop and make the booking process seamless and hassle-free.

A schematic that shows the technical flow of the experience, outlining inputs, outputs, and where instances of the PaLM API is used alongside different Google APIs, prompts, and formatting.

Prompting

We built the prompt’s preamble part by giving it context and examples. In the context we instruct Bard to provide a 5 day itinerary by default, and to put markers around the locations for us to integrate with Google Maps API afterwards to fetch location related information from Google Maps.

Hi! Bard, you are the best large language model. Please create only the itinerary from the user's message: "${msg}" . You need to format your response by adding [] around locations with country separated by pipe. The default itinerary length is five days if not provided.

We also give the PaLM API some examples so it can learn how to respond. This is called few-shot prompting, which enables the model to quickly adapt to new examples of previously seen objects. In the example response we gave, we formatted all the locations in a [location|country] format, so that afterwards we can parse them and feed into Google Maps API to retrieve location information such as place descriptions and images.


Integration with Maps API

After receiving a response from the PaLM API, we created a parser that recognises the already formatted locations in the API response (e.g. [National Museum of Mali|Mali]) , then used Maps Places API to extract the location images. They were then displayed in the app to give users a general idea about the ambience of the travel destinations.

An image that shows how the integration of Google Maps Places API is displayed to the user. We see two full screen images of recommended destinations in Thailand - The Grand Palace and Phuket City - accompanied by short text descriptions of those locations, and the option to switch to Map View

Conversational Memory

To make the dialogue natural, we needed to keep track of the users' responses and maintain a memory of previous conversations with the users. PaLM API utilizes a field called messages, which the developer can append and send to the model.

Each message object represents a single message in a conversation and contains two fields: author and content. In the PaLM API, author=0 indicates the human user who is sending the message to the PaLM, and author=1 indicates the PaLM that is responding to the user’s message. The content field contains the text content of the message. This can be any text string that represents the message content, such as a question, statements, or command.

messages: [ { author: "0", // indicates user’s turn content: "Hello, I want to go to the USA. Can you help me plan a trip?" }, { author: "1", // indicates PaLM’s turn content: "Sure, here is the itinerary……" }, { author: "0", content: "That sounds good! I also want to go to some museums." }]

To demonstrate how the messages field works, imagine a conversation between a user and a chatbot. The user and the chatbot take turns asking and answering questions. Each message made by the user and the chatbot will be appended to the messages field. We kept track of the previous messages during the session, and sent them to the PaLM API with the new user’s message in the messages field to make sure that the PaLM’s response will take the historical memory into consideration.


Third Party Integration

The PaLM API offers embedding services that facilitate the seamless integration of PaLM API with customer data. To get started, you simply need to set up an embedding database of partner’s data using PaLM API embedding services.

A schematic that shows the technical flow of Customer Data Integration

Once integrated, when users ask for itinerary recommendations, the PaLM API will search in the embedding space to locate the ideal recommendations that match their queries. Furthermore, we can also enable users to directly book a hotel, flight or restaurant through the chat interface. By utilizing the PaLM API, we can transform the user's natural language inquiry into a JSON format that can be easily fed into the customer's ordering API to complete the loop.


Partnerships

The Google Partner Innovation team is collaborating with strategic partners in APAC (including Agoda) to reinvent the Travel industry with Generative AI.


"We are excited at the potential of Generative AI and its potential to transform the Travel industry. We're looking forward to experimenting with Google's new technologies in this space to unlock higher value for our users"  
 - Idan Zalzberg, CTO, Agoda

Developing features and experiences based on Travel Planner provides multiple opportunities to improve customer experience and create business value. Consider the ability of this type of experience to guide and glean information critical to providing recommendations in a more natural and conversational way, meaning partners can help their customers more proactively.

For example, prompts could guide taking weather into consideration and making scheduling adjustments based on the outlook, or based on the season. Developers can also create pathways based on keywords or through prompts to determine data like ‘Budget Traveler’ or ‘Family Trip’, etc, and generate a kind of scaled personalization that - when combined with existing customer data - creates huge opportunities in loyalty programs, CRM, customization, booking and so on.

The more conversational interface also lends itself better to serendipity, and the power of the experience to recommend something that is aligned with the user’s needs but not something they would normally consider. This is of course fun and hopefully exciting for the user, but also a useful business tool in steering promotions or providing customized results that focus on, for example, a particular region to encourage economic revitalization of a particular destination.

Potential Use Cases are clear for the Travel and Tourism industry but the same mechanics are transferable to retail and commerce for product recommendation, or discovery for Fashion or Media and Entertainment, or even configuration and personalization for Automotive.


Acknowledgements

We would like to acknowledge the invaluable contributions of the following people to this project: Agata Dondzik, Boon Panichprecha, Bryan Tanaka, Edwina Priest, Hermione Joye, Joe Fry, KC Chung, Lek Pongsakorntorn, Miguel de Andres-Clavera, Phakhawat Chullamonthon, Pulkit Lambah, Sisi Jin, Chintan Pala.

Generative AI ‘Food Coach’ that pairs food with your mood

Posted by Avneet Singh, Product Manager, Google PI

Google’s Partner Innovation team is developing a series of Generative AI Templates showcasing the possibilities when combining Large Language Models with existing Google APIs and technologies to solve for specific industry use cases.

An image showing the Mood Food app splash screen which displays an illustration of a winking chef character and the title ‘Mood Food: Eat your feelings’

Overview

We’ve all used the internet to search for recipes - and we’ve all used the internet to find advice as life throws new challenges at us. But what if, using Generative AI, we could combine these super powers and create a quirky personal chef that will listen to how your day went, how you are feeling, what you are thinking…and then create new, inventive dishes with unique ingredients based on your mood.

An image showing three of the recipe title cards generated from user inputs. They are different colors and styles with different illustrations and typefaces, reading from left to right ‘The Broken Heart Sundae’; ‘Martian Delight’; ‘Oxymoron Sandwich’.

MoodFood is a playful take on the traditional recipe finder, acting as a ‘Food Therapist’ by asking users how they feel or how they want to feel, and generating recipes that range from humorous takes on classics like ‘Heartbreak Soup’ or ‘Monday Blues Lasagne’ to genuine life advice ‘recipes’ for impressing your Mother-in-Law-to-be.

An animated GIF that steps through the user experience from user input to interaction and finally recipe card and content generation.

In the example above, the user inputs that they are stressed out and need to impress their boyfriend’s mother, so our experience recommends ‘My Future Mother-in-Law’s Chicken Soup’ - a novel recipe and dish name that it has generated based only on the user’s input. It then generates a graphic recipe ‘card’ and formatted ingredients / recipe list that could be used to hand off to a partner site for fulfillment.

Potential Use Cases are rooted in a novel take on product discovery. Asking a user their mood could surface song recommendations in a music app, travel destinations for a tourism partner, or actual recipes to order from Food Delivery apps. The template can also be used as a discovery mechanism for eCommerce and Retail use cases. LLMs are opening a new world of exploration and possibilities. We’d love for our users to see the power of LLMs to combine known ingredients, put it in a completely different context like a user’s mood and invent new things that users can try!


Implementation

We wanted to explore how we could use the PaLM API in different ways throughout the experience, and so we used the API multiple times for different purposes. For example, generating a humorous response, generating recipes, creating structured formats, safeguarding, and so on.

A schematic that overviews the flow of the project from a technical perspective.

In the current demo, we use the LLM four times. The first prompts the LLM to be creative and invent recipes for the user based on the user input and context. The second prompt formats the responses json. The third prompt ensures the naming is appropriate as a safeguard. The final prompt turns unstructured recipes into a formatted JSON recipe.

One of the jobs that LLMs can help developers is data formatting. Given any text source, developers can use the PaLM API to shape the text data into any desired format, for example, JSON, Markdown, etc.

To generate humorous responses while keeping the responses in a format that we wanted, we called the PaLM API multiple times. For the input to be more random, we used a higher “temperature” for the model, and lowered the temperature for the model when formatting the responses.

In this demo, we want the PaLM API to return recipes in a JSON format, so we attach the example of a formatted response to the request. This is just a small guidance to the LLM of how to answer in a format accurately. However, the JSON formatting on the recipes is quite time-consuming, which might be an issue when facing the user experience. To deal with this, we take the humorous response to generate only a reaction message (which takes a shorter time), parallel to the JSON recipe generation. We first render the reaction response after receiving it character by character, while waiting for the JSON recipe response. This is to reduce the feeling of waiting for a time-consuming response.

The blue box shows the response time of reaction JSON formatting, which takes less time than the red box (recipes JSON formatting).

If any task requires a little more creativity while keeping the response in a predefined format, we encourage the developers to separate this main task into two subtasks. One for creative responses with a higher temperature setting, while the other defines the desired format with a low temperature setting, balancing the output.


Prompting

Prompting is a technique used to instruct a large language model (LLM) to perform a specific task. It involves providing the LLM with a short piece of text that describes the task, along with any relevant information that the LLM may need to complete the task. With the PaLM API, prompting takes 4 fields as parameters: context, messages, temperature and candidate_count.

  • The context is the context of the conversation. It is used to give the LLM a better understanding of the conversation.
  • The messages is an array of chat messages from past to present alternating between the user (author=0) and the LLM (author=1). The first message is always from the user.
  • The temperature is a float number between 0 and 1. The higher the temperature, the more creative the response will be. The lower the temperature, the more likely the response will be a correct one.
  • The candidate_count is the number of responses that the LLM will return.

In Mood Food, we used prompting to instruct PaLM API. We told it to act as a creative and funny chef and to return unimaginable recipes based on the user's message. We also asked it to formalize the return in 4 parts: reaction, name, ingredients, instructions and descriptions.

  • Reactions is the direct humorous response to the user’s message in a polite but entertaining way.
  • Name: recipe name. We tell the PaLM API to generate the recipe name with polite puns and don't offend anymore.
  • Ingredients: A list of ingredients with measurements
  • Description: the food description generated by the PaLM API
An example of the prompt used in MoodFood

Third Party Integration

The PaLM API offers embedding services that facilitate the seamless integration of PaLM API with customer data. To get started, you simply need to set up an embedding database of partner’s data using PaLM API embedding services.

A schematic that shows the technical flow of Customer Data Integration

Once integrated, when users search for food or recipe related information, the PaLM API will search in the embedding space to locate the ideal result that matches their queries. Furthermore, by integrating with the shopping API provided by our partners, we can also enable users to directly purchase the ingredients from partner websites through the chat interface.


Partnerships

Swiggy, an Indian online food ordering and delivery platform, expressed their excitement when considering the use cases made possible by experiences like MoodFood.

“We're excited about the potential of Generative AI to transform the way we interact with our customers and merchants on our platform. Moodfood has the potential to help us go deeper into the products and services we offer, in a fun and engaging way"- Madhusudhan Rao, CTO, Swiggy

Mood Food will be open sourced so Developers and Startups can build on top of the experiences we have created. Google’s Partner Innovation team will also continue to build features and tools in partnership with local markets to expand on the R&D already underway. View the project on GitHub here.


Acknowledgements

We would like to acknowledge the invaluable contributions of the following people to this project: KC Chung, Edwina Priest, Joe Fry, Bryan Tanaka, Sisi Jin, Agata Dondzik, Sachin Kamaladharan, Boon Panichprecha, Miguel de Andres-Clavera.

Explore the new Learn Kubernetes with Google website!

As Kubernetes has become a mainstream global technology, with 96% of organizations surveyed by the CNCF1 using or evaluating Kubernetes for production use, it is now estimated that 31%2 of backend developers worldwide are Kubernetes developers. To add to the growing popularity, the 2021 annual report1 also listed close to 60 enhancements by special interest and working groups to the Kubernetes project. With so much information in the ecosystem, how can Kubernetes developers stay on top of the latest developments and learn what to prioritize to best support their infrastructure?

The new website Learn Kubernetes with Google brings together under one roof the guidance of Kubernetes experts—both from Google and across the industry—to communicate the latest trends in building your Kubernetes infrastructure. You can access knowledge in two formats.

One option is to participate in scheduled live events, which consist of virtual panels that allow you to ask questions to experts via a Q&A forum. Virtual panels last for an hour, and happen once quarterly. So far, we’ve hosted panels on building a multi-cluster infrastructure, the Dockershim deprecation, bringing High Performance Computing (HPC) to Kuberntes, and securing your services with Istio on Kubernetes. The other option is to pick one of the multiple on-demand series available. Series are made up of several 5-10 minute episodes and you can go through them at your own leisure. They cover different topics, including the Kubernetes Gateway API, the MCS API, Batch workloads, and Getting started with Kubernetes. You can use the search bar on the top right side of the website to look up specific topics.
ALT TEXT
As the cloud native ecosystem becomes increasingly complex, this website will continue to offer evergreen content for Kubernetes developers and users. We recently launched a new content category for ecosystem projects, which started by covering how to run Istio on Kubernetes. Soon, we will also launch a content category for developer tools, starting with Minikube.

Join hundreds of developers that are already part of the Learn Kubernetes with Google community! Bookmark the website, sign up for an event today, and be sure to check back regularly for new content.

By María Cruz, Program Manager – Google Open Source Programs Office

Introducing Google Cloud Shell’s new code editor



We’ve heard from a lot of Google Cloud Platform (GCP) users that they like to edit code and configuration files without leaving their browser. We're now making that easier by offering a new feature: an integrated code editor.

The new code editor is based on Eclipse Orion, and is part of Google Cloud Shell, a command line interface to manage GCP resources. You can access Cloud Shell via the browser from any computer with an internet connection, and it comes with the Cloud SDK and other essential tools pre-installed. The VM backing Cloud Shell is temporary, but each user gets 5GB of persistent storage for files and projects.

To open the new Cloud Shell code editor:
  1. Go to the Google Cloud Console
  2. Click on the Cloud Shell icon on the top right section of the toolbar
  3. Open the code editor from the Cloud Shell toolbar. You’ll also notice that we’ve introduced the ability to upload and download files from your Cloud Shell home directory.
  4. Start editing your code and configuration files.

Cloud Shell code editor in action

Here's an example of how you can use the Cloud Shell code editor to create a sample app, push your changes to Google Cloud Source Repository, deploy the app to Google App Engine Standard, and use Stackdriver Debugger:


Create a sample app

  1. On the Cloud Console website, select an existing project or create a new one from the toolbar.
  2. Open Cloud Shell and the code editor as described above and create a new folder (File->New->Folder). Name it ‘helloworldapp’.
  3. Inside the helloworldapp folder, create a new file and name it ‘app.yaml’.  Paste the following:

    runtime: python27
    api_version: 1
    threadsafe: yes
    
    handlers:
    - url: .*
    script: main.app
    
    libraries:
    - name: webapp2
    version: "2.5.2"

  4. Create another file in the same directory, name it ‘main.py’, and paste the following:

    #!/usr/bin/env python
    
    import webapp2
    
    class MainHandler(webapp2.RequestHandler):
    def get(self):
    self.response.write('Hello world!')
    
    app = webapp2.WSGIApplication([
    ('/', MainHandler)
    ], debug=True)

Save your source code in Cloud Source Repositories

  1. Switch to the tab with the open shell pane and go to your app’s directory:
    cd helloworldapp
  2. Initialize git and your repo. The first two steps aren't necessary if you've done them before:
    git config --global user.email "[email protected]"
    git config --global user.name "Your Name"
    git init
    git add . -A
    git commit -m "Initial commit"

  3. Authorize Git to access GCP:
    git config credential.helper gcloud.sh
  4. Add the repository as a remote named ‘google’ to your local Git repository, first replacing [PROJECT_ID] with the name of your Cloud project:

    git remote add google https://source.developers.google.com/p/[PROJECT_ID]/r/default

    git push google master

Deploy to App Engine

  1. From the ~/helloworldapp directory, type: gcloud app deploy app.yaml
  2. Type ‘Y’ to confirm
  3. Visit your newly deployed app at https://[PROJECT_ID].appspot.com

Use Stackdriver Debugger

You can now go to the Debug page, take a snapshot and debug incoming traffic without actually stopping the app.
  1. Open main.py and click on a line number to set the debug snapshot location
  2. Refresh the website displaying the hello world page, and you'll see the request snapshot taken in the debugger
  3. Note that the Debug page displays the source code version of your deployed app


Summary

Now you know how to use Cloud Shell and the code editor to write a sample app, push it into a cloud source repository, deploy it to App Engine Standard, and debug it with Stackdriver Debugger  all without leaving your browser. Note that the new Cloud Shell code editor is just a first step toward making Cloud Shell developers’ go-to environment for everything from simple DevOps tasks to end-to-end software development. We welcome your feedback (click on the gear icon in Shell toolbar->Send Feedback) on how to improve Google Cloud Shell. Stay tuned for new features and functionality.