Tag Archives: TensorFlow

Massively Scaling Reinforcement Learning with SEED RL



Reinforcement learning (RL) has seen impressive advances over the last few years as demonstrated by the recent success in solving games such as Go and Dota 2. Models, or agents, learn by exploring an environment, such as a game, while optimizing for specified goals. However, current RL techniques require increasingly large amounts of training to successfully learn even simple games, which makes iterating research and product ideas computationally expensive and time consuming.

In “SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference”, we present an RL agent that scales to thousands of machines, which enables training at millions of frames per second, and significantly improves computational efficiency. This is achieved with a novel architecture that takes advantage of accelerators (GPUs or TPUs) at scale by centralizing model inference and introducing a fast communication layer. We demonstrate the performance of SEED RL on popular RL benchmarks, such as Google Research Football, Arcade Learning Environment and DeepMind Lab, and show that by using larger models, data efficiency can be increased. The code has been open sourced on Github together with examples for running on Google Cloud with GPUs.

Current Distributed Architectures
The previous generation of distributed reinforcement learning agents, such as IMPALA, made use of accelerators specialized for numerical calculations, taking advantage of the speed and efficiency from which (un)supervised learning has benefited for years. The architecture of an RL agent is usually separated into actors and learners. The actors typically run on CPUs and iterate between taking steps in the environment and running inference on the model to predict the next action. Frequently the actor will update the parameters of the inference model, and after collecting a sufficient amount of observations, will send a trajectory of observations and actions to the learner, which then optimizes the model. In this architecture, the learner trains the model on GPUs using input from distributed inference on hundreds of machines.

Example architecture for an earlier generation RL agent, IMPALA. Inference is done on the actors, usually using inefficient CPUs. Updated model parameters are frequently sent from the learner to the actors increasing bandwidth requirements.
The architecture of RL agents (such as IMPALA) have a number of drawbacks:
  1. Using CPUs for neural network inference is much less efficient and slower than using accelerators and becomes problematic as models become larger and more computationally expensive.
  2. The bandwidth required for sending parameters and intermediate model states between the actors and learner can be a bottleneck.
  3. Handling two completely different tasks on one machine (i.e., environment rendering and inference) is unlikely to utilize machine resources optimally.
SEED RL Architecture
The SEED RL architecture is designed to solve these drawbacks. With this approach, neural network inference is done centrally by the learner on specialized hardware (GPUs or TPUs), enabling accelerated inference and avoiding the data transfer bottleneck by ensuring that the model parameters and state are kept local. While observations are sent to the learner at every environment step, latency is kept low due to a very efficient network library based on the gRPC framework with asynchronous streaming RPCs. This makes it possible to achieve up to a million queries per second on a single machine. The learner can be scaled to thousands of cores (e.g., up to 2048 on Cloud TPUs) and the number of actors can be scaled to thousands of machines to fully utilize the learner, making it possible to train at millions of frames per second. SEED RL is based on the TensorFlow 2 API and, in our experiments, was accelerated by TPUs.
Overview of the architecture of SEED RL. In contrast to the IMPALA architecture, the actors only take actions in environments. Inference is executed centrally by the learner on accelerators using batches of data from multiple actors.
In order for this architecture to be successful, two state-of-the-art algorithms are integrated into SEED RL. The first is V-trace, a policy gradient-based method, first introduced with IMPALA. In general, policy gradient-based methods predict an action distribution from which an action can be sampled. However, because the actors and the learner execute asynchronously in SEED RL, the policy of actors is slightly behind the policy of the learner, i.e., they become off-policy. The usual policy gradient-based methods are on-policy, meaning that they have the same policy for actors and learner, and suffer from convergence and numerical issues in off-policy settings. V-trace is an off-policy method and thus works well in the asynchronous SEED RL architecture.

The second algorithm is R2D2, a Q-learning method that selects an action based on the predicted future value of that action using recurrent distributed replay. This approach allows the Q-learning algorithm to be run at scale, while still allowing the use of recurrent neural networks that can predict future values based on the information of all past frames in an episode.

Experiments
SEED RL is benchmarked on the commonly used Arcade Learning Environment, DeepMind Lab environments, and on the recently released Google Research Football environment.
Frames per second comparing IMPALA and various configurations of SEED RL on DeepMind Lab. SEED RL achieves 2.4M frames per second using 4,160 CPUs. Assuming the same speed, IMPALA would need 14,000 CPUs.
On DeepMind Lab, we achieve 2.4 million frames per second with 64 Cloud TPU cores, which represents an improvement of 80x over the previous state-of-the-art distributed agent, IMPALA. This results in a significant speed-up in wall-clock time and computational efficiency. IMPALA requires 3-4x as many CPUs as SEED RL for the same speed.
Episode return (i.e., the sum of rewards) over time on the DeepMind Lab game “explore_goal_locations_small” using IMPALA and SEED RL. With SEED RL, the time to train is significantly reduced.
With an architecture optimized for use on modern accelerators, it’s natural to increase the model size in an attempt to increase data efficiency. We show that by increasing the size of the model and the input resolution, we are able to solve a previously unsolved Google Research Football task, “Hard”.
The score of different architectures on the Google Research Football “Hard” task. We show that by using an input resolution and a larger model, the score is improved, and with more training, the model can significantly outperform the builtin AI.
Additional details are provided in the paper, including our results on the Arcade Learning Environment. We believe SEED RL and the results presented, demonstrate that reinforcement learning has once again caught up with the rest of the deep learning field in terms of taking advantage of accelerators.

Acknowledgements
This project was done in collaboration with Raphaël Marinier, Piotr Stanczyk, Ke Wang, Marcin Andrychowicz and Marcin Michalski. We would also like to thank Tom Small for the visualizations.

Source: Google AI Blog


Announcing TensorFlow Quantum: An Open Source Library for Quantum Machine Learning



“Nature isn’t classical, damnit, so if you want to make a simulation of nature, you’d better make it quantum mechanical.” — Physicist Richard Feynman

Machine learning (ML), while it doesn’t exactly simulate systems in nature, has the ability to learn a model of a system and predict the system’s behavior. Over the past few years, classical ML models have shown promise in tackling challenging scientific issues, leading to advancements in image processing for cancer detection, forecasting earthquake aftershocks, predicting extreme weather patterns, and detecting new exoplanets. With the recent progress in the development of quantum computing, the development of new quantum ML models could have a profound impact on the world’s biggest problems, leading to breakthroughs in the areas of medicine, materials, sensing, and communications. However, to date there has been a lack of research tools to discover useful quantum ML models that can process quantum data and execute on quantum computers available today.

Today, in collaboration with the University of Waterloo, X, and Volkswagen, we announce the release of TensorFlow Quantum (TFQ), an open-source library for the rapid prototyping of quantum ML models. TFQ provides the tools necessary for bringing the quantum computing and machine learning research communities together to control and model natural or artificial quantum systems; e.g. Noisy Intermediate Scale Quantum (NISQ) processors with ~50 - 100 qubits.

Under the hood, TFQ integrates Cirq with TensorFlow, and offers high-level abstractions for the design and implementation of both discriminative and generative quantum-classical models by providing quantum computing primitives compatible with existing TensorFlow APIs, along with high-performance quantum circuit simulators.

What is a Quantum ML Model?
A quantum model has the ability to represent and generalize data with a quantum mechanical origin. However, to understand quantum models, two concepts must be introduced - quantum data and hybrid quantum-classical models.

Quantum data exhibits superposition and entanglement, leading to joint probability distributions that could require an exponential amount of classical computational resources to represent or store. Quantum data, which can be generated / simulated on quantum processors / sensors / networks include the simulation of chemicals and quantum matter, quantum controlquantum communication networks, quantum metrology, and much more.

A technical, but key, insight is that quantum data generated by NISQ processors are noisy and are typically entangled just before the measurement occurs. However, applying quantum machine learning to noisy entangled quantum data can maximize extraction of useful classical information. Inspired by these techniques, the TFQ library provides primitives for the development of models that disentangle and generalize correlations in quantum data, opening up opportunities to improve existing quantum algorithms or discover new quantum algorithms.

The second concept to introduce is hybrid quantum-classical models. Because near-term quantum processors are still fairly small and noisy, quantum models cannot use quantum processors alone — NISQ processors will need to work in concert with classical processors to become effective. As TensorFlow already supports heterogeneous computing across CPUs, GPUs, and TPUs, it is a natural platform for experimenting with hybrid quantum-classical algorithms.

TFQ contains the basic structures, such as qubits, gates, circuits, and measurement operators that are required for specifying quantum computations. User-specified quantum computations can then be executed in simulation or on real hardware. Cirq also contains substantial machinery that helps users design efficient algorithms for NISQ machines, such as compilers and schedulers, and enables the implementation of hybrid quantum-classical algorithms to run on quantum circuit simulators, and eventually on quantum processors.

We’ve used TensorFlow Quantum for hybrid quantum-classical convolutional neural networks, machine learning for quantum control, layer-wise learning for quantum neural networks, quantum dynamics learning, generative modeling of mixed quantum states, and learning to learn with quantum neural networks via classical recurrent neural networks. We provide a review of these quantum applications in the TFQ white paper; each example can be run in-browser via Colab from our research repository.

How TFQ works
TFQ allows researchers to construct quantum datasets, quantum models, and classical control parameters as tensors in a single computational graph. The outcome of quantum measurements, leading to classical probabilistic events, is obtained by TensorFlow Ops. Training can be done using standard Keras functions.

To provide some intuition on how to use quantum data, one may consider a supervised classification of quantum states using a quantum neural network. Just like classical ML, a key challenge of quantum ML is to classify “noisy data”. To build and train such a model, the researcher can do the following:
  1. Prepare a quantum dataset - Quantum data is loaded as tensors (a multi-dimensional array of numbers). Each quantum data tensor is specified as a quantum circuit written in Cirq that generates quantum data on the fly. The tensor is executed by TensorFlow on the quantum computer to generate a quantum dataset.
  2. Evaluate a quantum neural network model - The researcher can prototype a quantum neural network using Cirq that they will later embed inside of a TensorFlow compute graph. Parameterized quantum models can be selected from several broad categories based on knowledge of the quantum data's structure. The goal of the model is to perform quantum processing in order to extract information hidden in a typically entangled state. In other words, the quantum model essentially disentangles the input quantum data, leaving the hidden information encoded in classical correlations, thus making it accessible to local measurements and classical post-processing.
  3. Sample or Average - Measurement of quantum states extracts classical information in the form of samples from a classical random variable. The distribution of values from this random variable generally depends on the quantum state itself and on the measured observable. As many variational algorithms depend on mean values of measurements, also known as expectation values, TFQ provides methods for averaging over several runs involving steps (1) and (2).
  4. Evaluate a classical neural networks model - Once classical information has been extracted, it is in a format amenable to further classical post-processing. As the extracted information may still be encoded in classical correlations between measured expectations, classical deep neural networks can be applied to distill such correlations.
  5. Evaluate Cost Function - Given the results of classical post-processing, a cost function is evaluated. This could be based on how accurately the model performs the classification task if the quantum data was labeled, or other criteria if the task is unsupervised.
  6. Evaluate Gradients & Update Parameters - After evaluating the cost function, the free parameters in the pipeline should be updated in a direction expected to decrease the cost. This is most commonly performed via gradient descent.
A high-level abstract overview of the computational steps involved in the end-to-end pipeline for inference and training of a hybrid quantum-classical discriminative model for quantum data in TFQ. To see the code for an end-to-end example, please check the “Hello Many-Worlds” example, the quantum convolutional neural networks tutorial, and our guide.
A key feature of TensorFlow Quantum is the ability to simultaneously train and execute many quantum circuits. This is achieved by TensorFlow’s ability to parallelize computation across a cluster of computers, and the ability to simulate relatively large quantum circuits on multi-core computers. To achieve the latter, we are also announcing the release of qsim (github link), a new high performance open source quantum circuit simulator, which has demonstrated the ability to simulate a 32 qubit quantum circuit with a gate depth of 14 in 111 seconds on a single Google Cloud node (n1-ultramem-160) (see this paper for details). The simulator is particularly optimized for multi-core Intel processors. Combined with TFQ, we have demonstrated 1 million circuit simulations for 20 qubit quantum circuit at a gate depth of 20 in 60 minutes on a Google Cloud node (n2-highcpu-80). See the TFQ white paper, Section II E on the Quantum Circuit Simulation with qsim for more information.

Looking Forward
Today, TensorFlow Quantum is primarily geared towards executing quantum circuits on classical quantum circuit simulators. In the future, TFQ will be able to execute quantum circuits on actual quantum processors that are supported by Cirq, including Google’s own processor Sycamore.

To learn more about TFQ, please read our white paper and visit the TensorFlow Quantum website. We believe that bridging the ML and Quantum communities will lead to exciting new discoveries across the board and accelerate the discovery of new quantum algorithms to solve the world’s most challenging problems.

Acknowledgements
This open source project is led by the Google AI Quantum team, and was co-developed by the University of Waterloo, Alphabet’s X, and Volkswagen. A special thanks to the University of Waterloo, whose students made major contributions to this open source software through multiple internship projects at the Google AI Quantum lab.

Source: Google AI Blog


Toward Human-Centered Design for ML Frameworks



As machine learning (ML) increasingly impacts diverse stakeholders and social groups, it has become necessary for a broader range of developers — even those without formal ML training — to be able to adapt and apply ML to their own problems. In recent years, there have been many efforts to lower the barrier to machine learning, by abstracting complex model behavior into higher-level APIs. For instance, Google has been developing TensorFlow.js, an open-source framework that lets developers write ML code in JavaScript to run directly in web browsers. Despite the abundance of engineering work towards improving APIs, little is known about what non-ML software developers actually need to successfully adopt ML into their daily work practices. Specifically, what do they struggle with when trying modern ML frameworks, and what do they want these frameworks to provide?

In “Software Developers Learning Machine Learning: Motivations, Hurdles, and Desires,” which received a Best Paper Award at the IEEE conference on Visual Languages and Human-Centric Computing (VL/HCC), we share our research on these questions and report the results from a large-scale survey of 645 people who used TensorFlow.js. The vast majority of respondents were software or web developers, who were fairly new to machine learning and usually did not use ML as part of their primary job. We examined the hurdles experienced by developers when using ML frameworks and explored the features and tools that they felt would best assist in their adoption of these frameworks into their programming workflows.

What Do Developers Struggle With Most When Using ML Frameworks?
Interestingly, by far the most common challenge reported by developers was not the lack of a clear API, but rather their own lack of conceptual understanding of ML, which hindered their ability to successfully use ML frameworks. These hurdles ranged from the initial stages of picking a good problem to which they could apply TensorFlow.js (e.g., survey respondents reported not knowing “what to apply ML to, where ML succeeds, where it sucks”), to creating the architecture of a neural net (e.g., “how many units [do] I have to put in when adding layers to the model?”) and knowing how to set and tune parameters during model training (e.g., “deciding what optimizers, loss functions to use”). Without a conceptual understanding of how different parameters affect outcomes, developers often felt overwhelmed by the seemingly infinite space of parameters to tune when debugging ML models.

Without sufficient conceptual support, developers also found it hard to transfer lessons learned from “hello world” API tutorials to their own real-world problems. While API tutorials provide syntax for implementing specific models (e.g., classifying MNIST digits), they typically don't provide the underlying conceptual scaffolding necessary to generalize beyond that specific problem.

Developers often attributed these challenges to their own lack of experience in advanced mathematics. Ironically, despite the abundance of non-experts tinkering with ML frameworks nowadays, many felt that ML frameworks were intended for specialists with advanced training in linear algebra and calculus, and thus not meant for general software developers or product managers. This semblance of imposter syndrome may be fueled by the prevalence of esoteric mathematical terminology in API documentation, which may unintentionally give the impression that an advanced math degree is necessary for even practical integration of ML into software projects. Though math training is indeed beneficial, the ability to grasp and apply practical concepts (e.g., a model’s learning rate) to real-world problems does not require an advanced math degree.

What Do Developers Want From ML Frameworks?
Developers who responded to our survey wanted ML frameworks to teach them not only how to use the API, but also the unspoken idioms that would help them to effectively apply the framework to their own problems.

Pre-made Models with Explicit Support for Modification
A common desire was to have access to libraries of canonical ML models, so that they could modify an existing template rather than creating new ones from scratch. Currently, pre-trained models are being made more widely available in many ML platforms, including TensorFlow.js. However, in their current form, these models do not provide explicit support for novice consumption. For example, in our survey, developers reported substantial hurdles transferring and modifying existing model examples to their own use cases. Thus, the provision of pre-made ML models should also be coupled with explicit support for modification.

Synthesize ML Best Practices into Just-in-Time Hints
Developers also wished frameworks could provide ML best practices, i.e., practical tips and tricks that they could use when designing or debugging models. While ML experts may acquire heuristics and go-to strategies through years of dedicated trial and error, the mere decision overhead of “which parameter should I try tuning first?” can be overwhelming for developers who aren't ML experts. To help narrow this broad space of decision possibilities, ML frameworks could embed tips on best practices directly into the programming workflow. Currently, visualizations like TensorBoard and tfjs-vis make it possible to help see what's going on inside of their models.

Coupling these with just-in-time strategic pointers, such as whether to adapt a pre-trained model or to build one from scratch, or diagnostic checks, like practical tips to “decrease learning rate” if the model is not converging, could help users acquire and make use of practical strategies. These tips could serve as an intermediate scaffolding layer that helps demystify the math theory underlying ML into developer-friendly terms.

Support for Learning-by-Doing
Finally, even though ML frameworks are not traditional learning platforms, software developers are indeed treating them as lightweight vehicles for learning-by-doing. For example, one survey respondent appreciated when conceptual support was tightly interwoven into the framework, rather than being a separate resource: “...the small code demos that you can edit and run right there. Really helps basic understanding.” Another explained that “I prefer learning by doing, so I would like to see more tutorials, examples” embedded into ML frameworks. Some found it difficult to take a formal online course, and would rather learn in bite-sized pieces through hands-on tinkering: “Due to the rest of life, I have to fit learning into small 5-15 minute blocks.”

Given these desires to learn-by-doing, ML frameworks may need to more clearly distinguish between a spectrum of resources aimed at different levels of expertise. Although many frameworks already have “hello world” tutorials, to properly set expectations these frameworks could more explicitly differentiate between API (syntax-specific) onboarding and ML (conceptual) onboarding.

Looking Forward
Ultimately, as the frontiers of ML are still evolving, providing practical, conceptual tips for software developers and creating a shared reservoir of community-curated best practices can benefit ML experts and novices alike. Hopefully, these research findings pave the way for more user-centric designs of future ML frameworks.

Acknowledgements
This work would not have been possible without Yannick Assogba, Sandeep Gupta, Lauren Hannah-Murphy, Michael Terry, Ann Yuan, Nikhil Thorat, Daniel Smilkov, Martin Wattenberg, Fernanda Viegas, and members of PAIR and TensorFlow.js.

Source: Google AI Blog


Toward Human-Centered Design for ML Frameworks



As machine learning (ML) increasingly impacts diverse stakeholders and social groups, it has become necessary for a broader range of developers — even those without formal ML training — to be able to adapt and apply ML to their own problems. In recent years, there have been many efforts to lower the barrier to machine learning, by abstracting complex model behavior into higher-level APIs. For instance, Google has been developing TensorFlow.js, an open-source framework that lets developers write ML code in JavaScript to run directly in web browsers. Despite the abundance of engineering work towards improving APIs, little is known about what non-ML software developers actually need to successfully adopt ML into their daily work practices. Specifically, what do they struggle with when trying modern ML frameworks, and what do they want these frameworks to provide?

In “Software Developers Learning Machine Learning: Motivations, Hurdles, and Desires,” which received a Best Paper Award at the IEEE conference on Visual Languages and Human-Centric Computing (VL/HCC), we share our research on these questions and report the results from a large-scale survey of 645 people who used TensorFlow.js. The vast majority of respondents were software or web developers, who were fairly new to machine learning and usually did not use ML as part of their primary job. We examined the hurdles experienced by developers when using ML frameworks and explored the features and tools that they felt would best assist in their adoption of these frameworks into their programming workflows.

What Do Developers Struggle With Most When Using ML Frameworks?
Interestingly, by far the most common challenge reported by developers was not the lack of a clear API, but rather their own lack of conceptual understanding of ML, which hindered their ability to successfully use ML frameworks. These hurdles ranged from the initial stages of picking a good problem to which they could apply TensorFlow.js (e.g., survey respondents reported not knowing “what to apply ML to, where ML succeeds, where it sucks”), to creating the architecture of a neural net (e.g., “how many units [do] I have to put in when adding layers to the model?”) and knowing how to set and tune parameters during model training (e.g., “deciding what optimizers, loss functions to use”). Without a conceptual understanding of how different parameters affect outcomes, developers often felt overwhelmed by the seemingly infinite space of parameters to tune when debugging ML models.

Without sufficient conceptual support, developers also found it hard to transfer lessons learned from “hello world” API tutorials to their own real-world problems. While API tutorials provide syntax for implementing specific models (e.g., classifying MNIST digits), they typically don't provide the underlying conceptual scaffolding necessary to generalize beyond that specific problem.

Developers often attributed these challenges to their own lack of experience in advanced mathematics. Ironically, despite the abundance of non-experts tinkering with ML frameworks nowadays, many felt that ML frameworks were intended for specialists with advanced training in linear algebra and calculus, and thus not meant for general software developers or product managers. This semblance of imposter syndrome may be fueled by the prevalence of esoteric mathematical terminology in API documentation, which may unintentionally give the impression that an advanced math degree is necessary for even practical integration of ML into software projects. Though math training is indeed beneficial, the ability to grasp and apply practical concepts (e.g., a model’s learning rate) to real-world problems does not require an advanced math degree.

What Do Developers Want From ML Frameworks?
Developers who responded to our survey wanted ML frameworks to teach them not only how to use the API, but also the unspoken idioms that would help them to effectively apply the framework to their own problems.

Pre-made Models with Explicit Support for Modification
A common desire was to have access to libraries of canonical ML models, so that they could modify an existing template rather than creating new ones from scratch. Currently, pre-trained models are being made more widely available in many ML platforms, including TensorFlow.js. However, in their current form, these models do not provide explicit support for novice consumption. For example, in our survey, developers reported substantial hurdles transferring and modifying existing model examples to their own use cases. Thus, the provision of pre-made ML models should also be coupled with explicit support for modification.

Synthesize ML Best Practices into Just-in-Time Hints
Developers also wished frameworks could provide ML best practices, i.e., practical tips and tricks that they could use when designing or debugging models. While ML experts may acquire heuristics and go-to strategies through years of dedicated trial and error, the mere decision overhead of “which parameter should I try tuning first?” can be overwhelming for developers who aren't ML experts. To help narrow this broad space of decision possibilities, ML frameworks could embed tips on best practices directly into the programming workflow. Currently, visualizations like TensorBoard and tfjs-vis make it possible to help see what's going on inside of their models.

Coupling these with just-in-time strategic pointers, such as whether to adapt a pre-trained model or to build one from scratch, or diagnostic checks, like practical tips to “decrease learning rate” if the model is not converging, could help users acquire and make use of practical strategies. These tips could serve as an intermediate scaffolding layer that helps demystify the math theory underlying ML into developer-friendly terms.

Support for Learning-by-Doing
Finally, even though ML frameworks are not traditional learning platforms, software developers are indeed treating them as lightweight vehicles for learning-by-doing. For example, one survey respondent appreciated when conceptual support was tightly interwoven into the framework, rather than being a separate resource: “...the small code demos that you can edit and run right there. Really helps basic understanding.” Another explained that “I prefer learning by doing, so I would like to see more tutorials, examples” embedded into ML frameworks. Some found it difficult to take a formal online course, and would rather learn in bite-sized pieces through hands-on tinkering: “Due to the rest of life, I have to fit learning into small 5-15 minute blocks.”

Given these desires to learn-by-doing, ML frameworks may need to more clearly distinguish between a spectrum of resources aimed at different levels of expertise. Although many frameworks already have “hello world” tutorials, to properly set expectations these frameworks could more explicitly differentiate between API (syntax-specific) onboarding and ML (conceptual) onboarding.

Looking Forward
Ultimately, as the frontiers of ML are still evolving, providing practical, conceptual tips for software developers and creating a shared reservoir of community-curated best practices can benefit ML experts and novices alike. Hopefully, these research findings pave the way for more user-centric designs of future ML frameworks.

Acknowledgements
This work would not have been possible without Yannick Assogba, Sandeep Gupta, Lauren Hannah-Murphy, Michael Terry, Ann Yuan, Nikhil Thorat, Daniel Smilkov, Martin Wattenberg, Fernanda Viegas, and members of PAIR and TensorFlow.js.

Source: Google AI Blog


Toward Human-Centered Design for ML Frameworks



As machine learning (ML) increasingly impacts diverse stakeholders and social groups, it has become necessary for a broader range of developers — even those without formal ML training — to be able to adapt and apply ML to their own problems. In recent years, there have been many efforts to lower the barrier to machine learning, by abstracting complex model behavior into higher-level APIs. For instance, Google has been developing TensorFlow.js, an open-source framework that lets developers write ML code in JavaScript to run directly in web browsers. Despite the abundance of engineering work towards improving APIs, little is known about what non-ML software developers actually need to successfully adopt ML into their daily work practices. Specifically, what do they struggle with when trying modern ML frameworks, and what do they want these frameworks to provide?

In “Software Developers Learning Machine Learning: Motivations, Hurdles, and Desires,” which received a Best Paper Award at the IEEE conference on Visual Languages and Human-Centric Computing (VL/HCC), we share our research on these questions and report the results from a large-scale survey of 645 people who used TensorFlow.js. The vast majority of respondents were software or web developers, who were fairly new to machine learning and usually did not use ML as part of their primary job. We examined the hurdles experienced by developers when using ML frameworks and explored the features and tools that they felt would best assist in their adoption of these frameworks into their programming workflows.

What Do Developers Struggle With Most When Using ML Frameworks?
Interestingly, by far the most common challenge reported by developers was not the lack of a clear API, but rather their own lack of conceptual understanding of ML, which hindered their ability to successfully use ML frameworks. These hurdles ranged from the initial stages of picking a good problem to which they could apply TensorFlow.js (e.g., survey respondents reported not knowing “what to apply ML to, where ML succeeds, where it sucks”), to creating the architecture of a neural net (e.g., “how many units [do] I have to put in when adding layers to the model?”) and knowing how to set and tune parameters during model training (e.g., “deciding what optimizers, loss functions to use”). Without a conceptual understanding of how different parameters affect outcomes, developers often felt overwhelmed by the seemingly infinite space of parameters to tune when debugging ML models.

Without sufficient conceptual support, developers also found it hard to transfer lessons learned from “hello world” API tutorials to their own real-world problems. While API tutorials provide syntax for implementing specific models (e.g., classifying MNIST digits), they typically don't provide the underlying conceptual scaffolding necessary to generalize beyond that specific problem.

Developers often attributed these challenges to their own lack of experience in advanced mathematics. Ironically, despite the abundance of non-experts tinkering with ML frameworks nowadays, many felt that ML frameworks were intended for specialists with advanced training in linear algebra and calculus, and thus not meant for general software developers or product managers. This semblance of imposter syndrome may be fueled by the prevalence of esoteric mathematical terminology in API documentation, which may unintentionally give the impression that an advanced math degree is necessary for even practical integration of ML into software projects. Though math training is indeed beneficial, the ability to grasp and apply practical concepts (e.g., a model’s learning rate) to real-world problems does not require an advanced math degree.

What Do Developers Want From ML Frameworks?
Developers who responded to our survey wanted ML frameworks to teach them not only how to use the API, but also the unspoken idioms that would help them to effectively apply the framework to their own problems.

Pre-made Models with Explicit Support for Modification
A common desire was to have access to libraries of canonical ML models, so that they could modify an existing template rather than creating new ones from scratch. Currently, pre-trained models are being made more widely available in many ML platforms, including TensorFlow.js. However, in their current form, these models do not provide explicit support for novice consumption. For example, in our survey, developers reported substantial hurdles transferring and modifying existing model examples to their own use cases. Thus, the provision of pre-made ML models should also be coupled with explicit support for modification.

Synthesize ML Best Practices into Just-in-Time Hints
Developers also wished frameworks could provide ML best practices, i.e., practical tips and tricks that they could use when designing or debugging models. While ML experts may acquire heuristics and go-to strategies through years of dedicated trial and error, the mere decision overhead of “which parameter should I try tuning first?” can be overwhelming for developers who aren't ML experts. To help narrow this broad space of decision possibilities, ML frameworks could embed tips on best practices directly into the programming workflow. Currently, visualizations like TensorBoard and tfjs-vis make it possible to help see what's going on inside of their models.

Coupling these with just-in-time strategic pointers, such as whether to adapt a pre-trained model or to build one from scratch, or diagnostic checks, like practical tips to “decrease learning rate” if the model is not converging, could help users acquire and make use of practical strategies. These tips could serve as an intermediate scaffolding layer that helps demystify the math theory underlying ML into developer-friendly terms.

Support for Learning-by-Doing
Finally, even though ML frameworks are not traditional learning platforms, software developers are indeed treating them as lightweight vehicles for learning-by-doing. For example, one survey respondent appreciated when conceptual support was tightly interwoven into the framework, rather than being a separate resource: “...the small code demos that you can edit and run right there. Really helps basic understanding.” Another explained that “I prefer learning by doing, so I would like to see more tutorials, examples” embedded into ML frameworks. Some found it difficult to take a formal online course, and would rather learn in bite-sized pieces through hands-on tinkering: “Due to the rest of life, I have to fit learning into small 5-15 minute blocks.”

Given these desires to learn-by-doing, ML frameworks may need to more clearly distinguish between a spectrum of resources aimed at different levels of expertise. Although many frameworks already have “hello world” tutorials, to properly set expectations these frameworks could more explicitly differentiate between API (syntax-specific) onboarding and ML (conceptual) onboarding.

Looking Forward
Ultimately, as the frontiers of ML are still evolving, providing practical, conceptual tips for software developers and creating a shared reservoir of community-curated best practices can benefit ML experts and novices alike. Hopefully, these research findings pave the way for more user-centric designs of future ML frameworks.

Acknowledgements
This work would not have been possible without Yannick Assogba, Sandeep Gupta, Lauren Hannah-Murphy, Michael Terry, Ann Yuan, Nikhil Thorat, Daniel Smilkov, Martin Wattenberg, Fernanda Viegas, and members of PAIR and TensorFlow.js.

Source: Google AI Blog


Ultra-High Resolution Image Analysis with Mesh-TensorFlow



Deep neural network models form the backbone of most state-of-the-art image analysis and natural language processing algorithms. With the recent development of large-scale deep learning techniques such as data and model parallelism, large convolutional neural network (CNN) models can be trained on datasets of millions of images in minutes. However, applying a CNN model on ultra-high resolution images, such as 3D computed tomography (CT) images that can have up to 108 pixels, remains challenging. With existing techniques, a processor still needs to host a minimum of 32GB of partial, intermediate data, whereas individual GPUs or TPUs typically have only 12-32GB memory. A typical solution is to process image patches separately from one another, which leads to complicated implementation and sub-optimal performance due to information loss.

In “High Resolution Medical Image Analysis with Spatial Partitioning”, a collaboration with the Mayo Clinic, we push the boundary of massive data and model parallelism through use of the Mesh-TensorFlow framework, and demonstrate how this technique can be used for ultra-high resolution image analysis without compromising input resolution for practical feasibility. We implement a halo exchange algorithm to handle convolutional operations across spatial partitions in order to preserve relationships between neighboring partitions. As a result, we are able to train a 3D U-Net on ultra-high resolution images (3D images with 512 pixels in each dimension), with 256-way model parallelism. We have additionally open-sourced our Mesh-TensorFlow-based framework for both GPUs and TPUs for use by the broader research community.

Data and Model Parallelism with Mesh-TensorFlow
Our implementation is based on the Mesh-TensorFlow framework for easy and efficient data and model parallelism, which enables users to split tensors across a mesh of devices according to the user defined image layout. For example, users may provide the mesh of computational devices as 16 rows by 16 columns for a total of 256 processors, with two cores per processor. They then define the layout to map the spatial dimension x of their image to processor rows, map spatial dimension y to processor columns, and map the batch dimension (i.e., the number of image segments to be processed simultaneously) to cores. The partitioning and distributing of a training batch is implemented by Mesh-TensorFlow at the tensor level, without users worrying about implementation details. The figure below shows the concept with a simplified example:
Spatial partitioning of ultra-high resolution images, in this case, a 3D CT scan.
Spatial Partitioning with Halo Exchange
A convolution operation executed on an image often applies a filter that extends beyond the edge of the frame. While there are ways to address this when dealing with a single image, standard approaches do not take into account that for segmented images information beyond the frame edge may still be relevant. In order to yield accurate results, convolution operations on an image that has been spatially partitioned and redistributed across processors must take into account each image segment’s neighbors.

One potential solution might be to include overlapping regions in each spatial partition. However, since there are very likely many subsequent convolutional layers and each of them introduces overlap, the overlap will be relatively large — in fact, in most cases, the overlap could cover the entire image. Moreover, all overlapping regions must be included from the start, at the very first layer, which may run into the memory constraints that we are trying to resolve.

Our solution is totally different: we implemented a data communication step called halo exchange. Before every convolution operation, each spatial partition exchanges (receives and sends) margins with its neighbors, effectively expanding the image segment at its margins. The convolution operations are then applied locally on each device. This ensures that the result of the convolutions for the whole of the image remain identical with or without spatial partitioning.
Halo exchange ensures that cross-partition convolutions handle image segment edges correctly.
Proof of Concept - Segmentation of Liver Tumor CT Scans
We then applied this framework to the task of segmenting 3D CT scans of liver tumors (LiTS benchmark). For the evaluation metric, we use the Sørensen–Dice coefficient, which ranges from 0.0 to 1.0 with a score of 0 indicating no overlap between segmented and ground truth tumor regions and 1 indicating a perfect match. The results shown below demonstrate that higher data resolution yields better results. Although the return tends to diminish when using the full 5123 resolution (512 pixels in each of x, y, z directions), this work does open the possibility for ultra-high resolution image analysis.
Higher resolution data yields better segmentation accuracy.
Conclusion
Existing data and model parallelism techniques enabled the training of neural networks with billions of parameters, but cannot handle input images above ~108 pixels. In this work, we explore the applicability of CNNs on these ultra-high resolution images, and demonstrate promising results. Our Mesh-TensorFlow-based implementation works on both GPUs and TPUs, and with the released code, we hope to provide a possible solution for some previously impossible tasks.

Acknowledgments
We thank our collaborators Panagiotis Korfiatis, Ph.D., and Daniel Blezek, Ph.D., from Mayo Clinic for providing the initial 3D U-net model and training data. Thank you Greg Mikels for the POC work with Mayo Clinic. Special thanks to all the co-authors of the paper especially Noam Shazeer.

Source: Google AI Blog


Setting Fairness Goals with the TensorFlow Constrained Optimization Library



Many technologies that use supervised machine learning are having an increasingly positive impact on peoples’ day-to-day lives, from catching early signs of illnesses to filtering inappropriate content. There is, however, a growing concern that learned models, which generally satisfy the narrow requirement of minimizing a single loss function, may have difficulty addressing broader societal issues such as fairness, which generally requires trading-off multiple competing considerations. Even when such factors are taken into account, these systems may still be incapable of satisfying such complex design requirements, for example that a false negative might be “worse” than a false positive, or that the model being trained should be “similar” to a pre-existing model.

The TensorFlow Constrained Optimization (TFCO) library makes it easy to configure and train machine learning problems based on multiple different metrics (e.g. the precisions on members of certain groups, the true positive rates on residents of certain countries, or the recall rates of cancer diagnoses depending on age and gender). While these metrics are simple conceptually, by offering a user the ability to minimize and constrain arbitrary combinations of them, TFCO makes it easy to formulate and solve many problems of interest to the fairness community in particular (such as equalized odds and predictive parity) and the machine learning community more generally.

How Does TFCO Relate to Our AI Principles?
The release of TFCO puts our AI Principles into action, further helping guide the ethical development and use of AI in research and in practice. By putting TFCO into the hands of developers, we aim to better equip them to identify where their models can be risky and harmful, and to set constraints that ensure their models achieve desirable outcomes.

What Are the Goals?
Borrowing an example from Hardt et al., consider the task of learning a classifier that decides whether a person should receive a loan (a positive prediction) or not (negative), based on a dataset of people who either are able to repay a loan (a positive label), or are not (negative). To set up this problem in TFCO, we would choose an objective function that rewards the model for granting loans to those people who will pay them back, and would also impose fairness constraints that prevent it from unfairly denying loans to certain protected groups of people. In TFCO, the objective to minimize, and the constraints to impose, are represented as algebraic expressions (using normal Python operators) of simple basic rates.

Instructing TFCO to minimize the overall error rate of the learned classifier for a linear model (with no fairness constraints), might yield a decision boundary that looks like this:
Illustration of a binary classification dataset with two protected groups: blue and orange. For ease of visualization, rather than plotting each individual data point, the densities are represented as ovals. The positive and negative signs denote the labels. The decision boundary drawn as a black dashed line separating positive predictions (regions above the line) and negative (regions below the line) labels, chosen to maximize accuracy.
This is a fine classifier, but in certain applications, one might consider it to be unfair. For example, positively-labeled blue examples are much more likely to receive negative predictions than positively-labeled orange examples, violating the “equal opportunity” principle. To correct this, one could add an equal opportunity constraint to the constraint list. The resulting classifier would now look something like this:
Here the decision boundary is chosen to maximize the accuracy, subject to an equal opportunity (or true positive rate) constraint.
How Do I Know What Constraints To Set?
Choosing the “right” constraints depends on the policy goals or requirements of your problem and your users. For this reason, we’ve striven to avoid forcing the user to choose from a curated list of “baked-in” problems. Instead, we’ve tried to maximize flexibility by enabling the user to define an extremely broad range of possible problems, by combining and manipulating simple basic rates.

This flexibility can have a downside: if one isn’t careful, one might attempt to impose contradictory constraints, resulting in a constrained problem with no good solutions. In the context of the above example, one could constrain the false positive rates (FPRs) to be equal, in addition to the true positive rates (TPRs) (i.e., “equalized odds”). However, the potentially contradictory nature of these two sets of constraints, coupled with our requirement for a linear model, could force us to find a solution with extremely low accuracy. For example:
Here the decision boundary is chosen to maximize the accuracy, subject to both the true positive rate and false positive rate constraints.
With an insufficiently-flexible model, either the FPRs of both groups would be equal, but very large (as in the case illustrated above), or the TPRs would be equal, but very small (not shown).

Can It Fail?
The ability to express many fairness goals as rate constraints can help drive progress in the responsible development of machine learning, but it also requires developers to carefully consider the problem they are trying to address. For example, suppose one constrains the training to give equal accuracy for four groups, but that one of those groups is much harder to classify. In this case, it could be that the only way to satisfy the constraints is by decreasing the accuracy of the three easier groups, so that they match the low accuracy of the fourth group. This probably isn’t the desired outcome.

A “safer” alternative is to constrain each group to independently satisfy some absolute metric, for example by requiring each group to achieve at least 75% accuracy. Using such absolute constraints rather than relative constraints will generally keep the groups from dragging each other down. Of course, it is possible to ask for a minimum accuracy that isn’t achievable, so some conscientiousness is still required.

The Curse of Small Sample Sizes
Another common challenge with using constrained optimization is that the groups to which constraints are applied may be under-represented in the dataset. Consequently, the stochastic gradients we compute during training will be very noisy, resulting in slow convergence. In such a scenario, we recommend that users impose the constraints on a separate rebalanced dataset that contains higher proportions from each group, and use the original dataset only to minimize the objective.

For example, in the Wiki toxicity example we provide, we wish to predict if a discussion comment posted on a Wiki talk page is toxic (i.e., contains “rude, disrespectful or unreasonable” content). Only 1.3% of the comments mention a term related to “sexuality”, and a large fraction of these comments are labelled toxic. Hence, training a CNN model without constraints on this dataset leads to the model believing that “sexuality” is a strong indicator of toxicity and results in a high false positive rate for this group. We use TFCO to constrain the false positive rate for four sensitive topics (sexuality, gender identity, religion and race) to be within 2%. To better handle the small group sizes, we use a “re-balanced” dataset to enforce the constraints and the original dataset only to minimize the objective. As shown below, the constrained model is able to significantly lower the false positive rates on the four topic groups, while maintaining almost the same accuracy as the unconstrained model.
Comparison of unconstrained and constrained CNN models for classifying toxic comments on Wiki Talk pages.
Intersectionality – The Challenge of Fine Grained Groups
Overlapping constraints can help create equitable experiences for multiple categories of historically marginalized and minority groups. Extending beyond the above example, we also provide a CelebA example that examines a computer vision model for detecting smiles in images that we wish to perform well across multiple non-mutually-exclusive protected groups. The false positive rate can be an appropriate metric here, since it measures the fraction of images not containing a smiling face that are incorrectly labeled as smiling. By comparing false positive rates based on available age group (young and old) or sex (male and female) categories, we can check for undesirable model bias (i.e., whether images of older people that are smiling are not recognized as such).
Comparison of unconstrained and constrained CNN models for classifying toxic comments on Wiki Talk pages.
Under the Hood
Correctly handling rate constraints is challenging because, being written in terms of counts (e.g., the accuracy rate is the number of correct predictions, divided by the number of examples), the constraint functions are non-differentiable. Algorithmically, TFCO converts a constrained problem into a non-zero-sum two-player game (ALT’19, JMLR’19). This framework can be extended to handle the ranking and regression settings (AAAI’20), more complex metrics such as the F-measure (NeurIPS’19a), or to improve generalization performance (ICML’19).

It is our belief that the TFCO library will be useful in training ML models that take into account the societal and cultural factors necessary to satisfy real-world requirements. Our provided examples (toxicity classification and smile detection) only scratch the surface. We hope that TFCO’s flexibility enables you to handle your problem’s unique requirements.

Acknowledgements
This work was a collaborative effort by the authors of TFCO and associated research papers, including Andrew Cotter, Maya R. Gupta, Heinrich Jiang, Harikrishna Narasimhan, Taman Narayan, Nathan Srebro, Karthik Sridharan, Serena Wang, Blake Woodworth, and Seungil You.

Source: Google AI Blog


MediaPipe on the Web

Posted by Michael Hays and Tyler Mullen from the MediaPipe team

MediaPipe is a framework for building cross-platform multimodal applied ML pipelines. We have previously demonstrated building and running ML pipelines as MediaPipe graphs on mobile (Android, iOS) and on edge devices like Google Coral. In this article, we are excited to present MediaPipe graphs running live in the web browser, enabled by WebAssembly and accelerated by XNNPack ML Inference Library. By integrating this preview functionality into our web-based Visualizer tool, we provide a playground for quickly iterating over a graph design. Since everything runs directly in the browser, video never leaves the user’s computer and each iteration can be immediately tested on a live webcam stream (and soon, arbitrary video).

Running the MediaPipe face detection example in the Visualizer

Figure 1 shows the running of the MediaPipe face detection example in the Visualizer

MediaPipe Visualizer

MediaPipe Visualizer (see Figure 2) is hosted at viz.mediapipe.dev. MediaPipe graphs can be inspected by pasting graph code into the Editor tab or by uploading that graph file into the Visualizer. A user can pan and zoom into the graphical representation of the graph using the mouse and scroll wheel. The graph will also react to changes made within the editor in real time.

MediaPipe Visualizer hosted at https://viz.mediapipe.dev

Figure 2 MediaPipe Visualizer hosted at https://viz.mediapipe.dev

Demos on MediaPipe Visualizer

We have created several sample Visualizer demos from existing MediaPipe graph examples. These can be seen within the Visualizer by visiting the following addresses in your Chrome browser:

Edge Detection

Face Detection

Hair Segmentation

Hand Tracking

Edge detection
Face detection
Hair segmentation
Hand tracking

Each of these demos can be executed within the browser by clicking on the little running man icon at the top of the editor (it will be greyed out if a non-demo workspace is loaded):

This will open a new tab which will run the current graph (this requires a web-cam).

Implementation Details

In order to maximize portability, we use Emscripten to directly compile all of the necessary C++ code into WebAssembly, which is a special form of low-level assembly code designed specifically for web browsers. At runtime, the web browser creates a virtual machine in which it can execute these instructions very quickly, much faster than traditional JavaScript code.

We also created a simple API for all necessary communications back and forth between JavaScript and C++, to allow us to change and interact with the MediaPipe graph directly from JavaScript. For readers familiar with Android development, you can think of this as a similar process to authoring a C++/Java bridge using the Android NDK.

Finally, we packaged up all the requisite demo assets (ML models and auxiliary text/data files) as individual binary data packages, to be loaded at runtime. And for graphics and rendering, we allow MediaPipe to automatically tap directly into WebGL so that most OpenGL-based calculators can “just work” on the web.

Performance

While executing WebAssembly is generally much faster than pure JavaScript, it is also usually much slower than native C++, so we made several optimizations in order to provide a better user experience. We utilize the GPU for image operations when possible, and opt for using the lightest-weight possible versions of all our ML models (giving up some quality for speed). However, since compute shaders are not widely available for web, we cannot easily make use of TensorFlow Lite GPU machine learning inference, and the resulting CPU inference often ends up being a significant performance bottleneck. So to help alleviate this, we automatically augment our “TfLiteInferenceCalculator” by having it use the XNNPack ML Inference Library, which gives us a 2-3x speedup in most of our applications.

Currently, support for web-based MediaPipe has some important limitations:

  • Only calculators in the demo graphs above may be used
  • The user must edit one of the template graphs; they cannot provide their own from scratch
  • The user cannot add or alter assets
  • The executor for the graph must be single-threaded (i.e. ApplicationThreadExecutor)
  • TensorFlow Lite inference on GPU is not supported

We plan to continue to build upon this new platform to provide developers with much more control, removing many if not all of these limitations (e.g. by allowing for dynamic management of assets). Please follow the MediaPipe tag on the Google Developer blog and Google Developer twitter account. (@googledevs)

Acknowledgements

We would like to thank Marat Dukhan, Chuo-Ling Chang, Jianing Wei, Ming Guang Yong, and Matthias Grundmann for contributing to this blog post.

New Coral products for 2020

Posted by Billy Rutledge, Director Google Research, Coral Team

More and more industries are beginning to recognize the value of local AI, where the speed of local inference allows considerable savings on bandwidth and cloud compute costs, and keeping data local preserves user privacy.

Last year, we launched Coral, our platform of hardware components and software tools that make it easy to prototype and scale local AI products. Our product portfolio includes the Coral Dev Board, USB Accelerator, and PCIe Accelerators, all now available in 36 countries.

Since our release, we’ve been excited by the diverse range of applications already built on Coral across a broad set of industries that range from healthcare to agriculture to smart cities. And for 2020, we’re excited to announce new additions to the Coral platform that will expand the possibilities even further.

First up is the Coral Accelerator Module, an easy to integrate multi-chip package that encapsulates the Edge TPU ASIC. The module exposes both PCIe and USB interfaces and can easily integrate into custom PCB designs. We’ve been working closely with Murata to produce the module and you can see a demo at CES 2020 by visiting their booth at the Las Vegas Convention Center, Tech East, Central Plaza, CP-18. The Coral Accelerator Module will be available in the first half of 2020.

Coral Accelerator Module, a new multi-chip module with Google Edge TPU

Coral Accelerator Module, a new multi-chip module with Google Edge TPU

Next, we’re announcing the Coral Dev Board Mini, which provides a smaller form-factor, lower-power, and lower-cost alternative to the Coral Dev Board. The Mini combines the new Coral Accelerator Module with the MediaTek 8167s SoC to create a board that excels at 720P video encoding/decoding and computer vision use cases. The board will be on display during CES 2020 at the MediaTek showcase located in the Venetian, Tech West, Level 3. The Coral Dev Board Mini will be available in the first half of 2020.

We're also offering new variations to the Coral System-on-Module, now available with 2GB and 4GB LPDDR4 RAM in addition to the original 1GB LPDDR4 configuration. We’ll be showcasing how the SoM can be used in smart city, manufacturing, and healthcare applications, as well as a few new SoC and MCU explorations we’ve been working on with the NXP team at CES 2020 in their pavilion located at the Las Vegas Convention Center, Tech East, Central Plaza, CP-18.

Finally, Asus has chosen the Coral SOM as the base to their Tinker Edge T product, a maker friendly single-board computer that features a rich set of I/O interfaces, multiple camera connectors, programmable LEDs, and color-coded GPIO header. The Tinker Edge T board will be available soon -- more details can be found here from Asus.

Come visit Coral at CES Jan 7-10 in Las Vegas:

  • NXP exhibit (LVCC, Tech East, Central Plaza, CP-18)
  • Mediatek exhibit (Venetian, Tech West, Level 3)
  • Murata exhibit (LVCC, South Hall 2, MP26061)

And, as always, we are always looking for ways to improve the platform, so keep reaching out to us at [email protected].

Fairness Indicators: Scalable Infrastructure for Fair ML Systems



While industry and academia continue to explore the benefits of using machine learning (ML) to make better products and tackle important problems, algorithms and the datasets on which they are trained also have the ability to reflect or reinforce unfair biases. For example, consistently flagging non-toxic text comments from certain groups as “spam” or “high toxicity” in a moderation system leads to exclusion of those groups from conversation.

In 2018, we shared how Google uses AI to make products more useful, highlighting AI principles that will guide our work moving forward. The second principle, “Avoid creating or reinforcing unfair bias,” outlines our commitment to reduce unjust biases and minimize their impacts on people.

As part of this commitment, at TensorFlow World, we recently released a beta version of Fairness Indicators, a suite of tools that enable regular computation and visualization of fairness metrics for binary and multi-class classification, helping teams take a first step towards identifying unjust impacts. Fairness Indicators can be used to generate metrics for transparency reporting, such as those used for model cards, to help developers make better decisions about how to deploy models responsibly. Because fairness concerns and evaluations differ case by case, we also include in this release an interactive case study with Jigsaw’s Unintended Bias in Toxicity dataset to illustrate how Fairness Indicators can be used to detect and remediate bias in a production machine learning (ML) model, depending on the context in which it is deployed. Fairness Indicators is now available in beta for you to try for your own use cases.

What is ML Fairness?
Bias can manifest in any part of a typical machine learning pipeline, from an unrepresentative dataset, to learned model representations, to the way in which the results are presented to the user. Errors that result from this bias can disproportionately impact some users more than others.

To detect this unequal impact, evaluation over individual slices, or groups of users, is crucial as overall metrics can obscure poor performance for certain groups. These groups may include, but are not limited to, those defined by sensitive characteristics such as race, ethnicity, gender, nationality, income, sexual orientation, ability, and religious belief. However, it is also important to keep in mind that fairness cannot be achieved solely through metrics and measurement; high performance, even across slices, does not necessarily prove that a system is fair. Rather, evaluation should be viewed as one of the first ways, especially for classification models, to identify gaps in performance.

The Fairness Indicators Suite of Tools
The Fairness Indicators tool suite enables computation and visualization of commonly-identified fairness metrics for classification models, such as false positive rate and false negative rate, making it easy to compare performance across slices or to a baseline slice. The tool computes confidence intervals, which can surface statistically significant disparities, and performs evaluation over multiple thresholds. In the UI, it is possible to toggle the baseline slice and investigate the performance of various other metrics. The user can also add their own metrics for visualization, specific to their use case.

Furthermore, Fairness Indicators is integrated with the What-If Tool (WIT) — clicking on a bar in the Fairness Indicators graph will load those specific data points into the the WIT widget for further inspection, comparison, and counterfactual analysis. This is particularly useful for large datasets, where Fairness Indicators can be used to identify problematic slices before the WIT is used for a deeper analysis.
Using Fairness Indicators to visualize metrics for fairness evaluation.
Clicking on a slice in Fairness Indicators will load all the data points in that slice inside the What-If Tool widget. In this case, all data points with the “female” label are shown.
The Fairness Indicators beta launch includes the following:
How To Use Fairness Indicators in Models Today
Fairness Indicators is built on top of TensorFlow Model Analysis, a component of TensorFlow Extended (TFX) that can be used to investigate and visualize model performance. Based on the specific ML workflow, Fairness Indicators can be incorporated into a system in one of the following ways:
If using TensorFlow models and tools, such as TFX:
  • Access Fairness Indicators as part of the Evaluator component in TFX
  • Access Fairness Indicators in TensorBoard when evaluating other real-time metrics
If not using existing TensorFlow tools:
  • Download the Fairness Indicators pip package, and use Tensorflow Model Analysis as a standalone tool
For non-TensorFlow models:
Fairness Indicators Case Study
We created a case study and introductory video that illustrates how Fairness Indicators can be used with a combination of tools to detect and mitigate bias in a model trained on Jigsaw’s Unintended Bias in Toxicity dataset. The dataset was developed by Conversation AI, a team within Jigsaw that works to train ML models to protect voices in conversation. Models are trained to predict whether text comments are likely to be abusive along a variety of dimensions including toxicity, insult, and sexual explicitness.

The primary use case for models such as these is content moderation. If a model penalizes certain types of messages in a systematic way (e.g., often marks comments as toxic when they are not, leading to a high false positive rate), those voices will be silenced. In the case study, we investigated false positive rate on subgroups sliced by gender identity keywords that are present in the dataset, using a combination of tools (Fairness Indicators, TFDV, and WIT) to detect, diagnose, and take steps toward remediating the underlying problem.

What’s next?
Fairness Indicators is only the first step. We plan to expand vertically by enabling more supported metrics, such as metrics that enable you to evaluate classifiers without thresholds, and horizontally by creating remediation libraries that utilize methods, such as active learning and min-diff. Because we believe it is important to learn through real examples, we hope to ground our work in more case studies to be released over the next few months, as more features become available.

To get started, see the Fairness Indicators GitHub repo. For more information on how to think about fairness evaluation in the context of your use case, see this link.

We would love to partner with you to understand where Fairness Indicators is most useful, and where added functionality would be valuable. Please reach out at [email protected] to provide any feedback on your experience!

Acknowledgements
The core team behind this work includes Christina Greer, Manasi Joshi, Huanming Fang, Shivam Jindal, Karan Shukla, Osman Aka, Sanders Kleinfeld, Alicia Chang, Alex Hanna, and Dan Nanas. We would also like to thank James Wexler, Mahima Pushkarna, Meg Mitchell and Ben Hutchinson for their contributions to the project.

Source: Google AI Blog