Tag Archives: Publications

Google at CVPR 2020

Posted by Emily Knapp, Program Manager and Benjamin Hütteroth, Program Specialist

This week marks the start of the fully virtual 2020 Conference on Computer Vision and Pattern Recognition (CVPR 2020), the premier annual computer vision event consisting of the main conference, workshops and tutorials. As a leader in computer vision research and a Supporter Level Virtual Sponsor, Google will have a strong presence at CVPR 2020, with nearly 70 publications accepted, along with the organization of, and participation in, multiple workshops/tutorials.

If you are participating in CVPR this year, please visit our virtual booth to learn about what Google is actively pursuing for the next generation of intelligent systems that utilize the latest machine learning techniques applied to various areas of machine perception.

You can also learn more about our research being presented at CVPR 2020 in the list below (Google affiliations are bolded).

Organizing Committee

General Chairs: Terry Boult, Gerard Medioni, Ramin Zabih
Program Chairs: Ce Liu, Greg Mori, Kate Saenko, Silvio Savarese
Workshop Chairs: Tal Hassner, Tali Dekel
Website Chairs: Tianfan Xue, Tian Lan
Technical Chair: Daniel Vlasic
Area Chairs include: Alexander Toshev, Alexey Dosovitskiy, Boqing Gong, Caroline Pantofaru, Chen Sun, Deqing Sun, Dilip Krishnan, Feng Yang, Liang-Chieh Chen, Michael Rubinstein, Rodrigo Benenson, Timnit Gebru, Thomas Funkhouser, Varun Jampani, Vittorio Ferrari, William Freeman

Oral Presentations

Evolving Losses for Unsupervised Video Representation Learning
AJ Piergiovanni, Anelia Angelova, Michael Ryoo

CvxNet: Learnable Convex Decomposition
Boyang Deng, Kyle Genova, Soroosh Yazdani, Sofien Bouaziz, Geoffrey Hinton, Andrea Tagliasacchi

Neural SDE: Stabilizing Neural ODE Networks with Stochastic Noise
Xuanqing Liu, Tesi Xiao, Si Si, Qin Cao, Sanjiv Kumar, Cho-Jui Hsieh

Scalability in Perception for Autonomous Driving: Waymo Open Dataset
Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla‎, Aurélien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, Vijay Vasudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev‎, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi‎, Sheng Zhao, Shuyang Chen, Yu Zhang, Jon Shlens, Zhifeng Chen, Dragomir Anguelov

Deep Implicit Volume Compression
Saurabh Singh, Danhang Tang, Cem Keskin, Philip Chou, Christian Haene, Mingsong Dou, Sean Fanello, Jonathan Taylor, Andrea Tagliasacchi, Philip Davidson, Yinda Zhang, Onur Guleryuz, Shahram Izadi, Sofien Bouaziz

Neural Networks Are More Productive Teachers Than Human Raters: Active Mixup for Data-Efficient Knowledge Distillation from a Blackbox Model
Dongdong Wan, Yandong Li, Liqiang Wang, and Boqing Gong

Google Landmarks Dataset v2 - A Large-Scale Benchmark for Instance-Level Recognition and Retrieval (see the blog post)
Tobias Weyand, Andre Araujo, Jack Sim, Bingyi Cao

CycleISP: Real Image Restoration via Improved Data Synthesis
Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao

Dynamic Graph Message Passing Networks
Li Zhang, Dan Xu, Anurag Arnab, Philip Torr

Local Deep Implicit Functions for 3D Shape
Kyle Genova, Forrester Cole, Avneesh Sud, Aaron Sarna, Thomas Funkhouser

GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models
Hongyi Xu, Eduard Gabriel Bazavan, Andrei Zanfir, William Freeman, Rahul Sukthankar, Cristian Sminchisescu

Search to Distill: Pearls are Everywhere but not the Eyes
Yu Liu, Xuhui Jia, Mingxing Tan, Raviteja Vemulapalli, Yukun Zhu, Bradley Green, Xiaogang Wang

Semantic Pyramid for Image Generation
Assaf Shocher, Yossi Gandelsman, Inbar Mosseri, Michal Yarom, Michal Irani, William Freeman, Tali Dekel

Flow Contrastive Estimation of Energy-Based Models
Ruiqi Gao, Erik Nijkamp, Diederik Kingma, Zhen Xu, Andrew Dai, Ying Nian Wu

Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition from A Domain Adaptation Perspective
Muhammad Abdullah Jamal, Matthew Brown, Ming-Hsuan Yang, Liqiang Wang, Boqing Gong

Category-Level Articulated Object Pose Estimation
Xiaolong Li, He Wang, Li Yi, Leonidas Guibas, Amos Abbott, Shuran Song

AdaCoSeg: Adaptive Shape Co-Segmentation with Group Consistency Loss
Chenyang Zhu, Kai Xu, Siddhartha Chaudhuri, Li Yi, Leonidas Guibas, Hao Zhang

SpeedNet: Learning the Speediness in Videos
Sagie Benaim, Ariel Ephrat, Oran Lang, Inbar Mosseri, William Freeman, Michael Rubinstein, Michal Irani, Tali Dekel

BSP-Net: Generating Compact Meshes via Binary Space Partitioning
Zhiqin Chen, Andrea Tagliasacchi, Hao Zhang

SAPIEN: A SimulAted Part-based Interactive ENvironment
Fanbo Xiang, Yuzhe Qin, Kaichun Mo, Yikuan Xia, Hao Zhu, Fangchen Liu, Minghua Liu, Hanxiao Jiang, Yifu Yuan, He Wang, Li Yi, Angel Chang, Leonidas Guibas, Hao Su

SurfelGAN: Synthesizing Realistic Sensor Data for Autonomous Driving
Zhenpei Yang, Yuning Chai, Dragomir Anguelov, Yin Zhou, Pei Sun, Dumitru Erhan, Sean Rafferty, Henrik Kretzschmar

Filter Response Normalization Layer: Eliminating Batch Dependence in the Training of Deep Neural Networks
Saurabh Singh, Shankar Krishnan

RL-CycleGAN: Reinforcement Learning Aware Simulation-To-Real
Kanishka Rao, Chris Harris, Alex Irpan, Sergey Levine, Julian Ibarz, Mohi Khansari

Open Compound Domain Adaptation
Ziwei Liu, Zhongqi Miao, Xingang Pan, Xiaohang Zhan, Dahua Lin, Stella X.Yu, and Boqing Gong

Posters
Single-view view synthesis with multiplane images
Richard Tucker, Noah Snavely

Adversarial Examples Improve Image Recognition
Cihang Xie, Mingxing Tan, Boqing Gong, Jiang Wang, Alan Yuille, Quoc V. Le

Adversarial Texture Optimization from RGB-D Scans
Jingwei Huang, Justus Thies, Angela Dai, Abhijit Kundu, Chiyu “Max” Jiang,Leonidas Guibas, Matthias Niessner, Thomas Funkhouser

Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline
Yu-Lun Liu, Wei-Sheng Lai, Yu-Sheng Chen, Yi-Lung Kao, Ming-Hsuan Yang,Yung-Yu Chuang, Jia-Bin Huang

Collaborative Distillation for Ultra-Resolution Universal Style Transfer
Huan Wang, Yijun Li, Yuehai Wang, Haoji Hu, Ming-Hsuan Yang

Learning to Autofocus
Charles Herrmann, Richard Strong Bowen, Neal Wadhwa, Rahul Garg, Qiurui He, Jonathan T. Barron, Ramin Zabih

Multi-Scale Boosted Dehazing Network with Dense Feature Fusion
Hang Dong, Jinshan Pan, Lei Xiang, Zhe Hu, Xinyi Zhang, Fei Wang, Ming-Hsuan Yang

Composing Good Shots by Exploiting Mutual Relations
Debang Li, Junge Zhang, Kaiqi Huang, Ming-Hsuan Yang

PatchVAE: Learning Local Latent Codes for Recognition
Kamal Gupta, Saurabh Singh, Abhinav Shrivastava

Neural Voxel Renderer: Learning an Accurate and Controllable Rendering Tool
Konstantinos Rematas, Vittorio Ferrari

Local Implicit Grid Representations for 3D Scenes
Chiyu “Max” Jiang, Avneesh Sud, Ameesh Makadia, Jingwei Huang, Matthias Niessner, Thomas Funkhouser

Large Scale Video Representation Learning via Relational Graph Clustering
Hyodong Lee, Joonseok Lee, Joe Yue-Hei Ng, Apostol (Paul) Natsev

Deep Homography Estimation for Dynamic Scenes
Hoang Le, Feng Liu, Shu Zhang, Aseem Agarwala

C-Flow: Conditional Generative Flow Models for Images and 3D Point Clouds
Albert Pumarola, Stefan Popov, Francesc Moreno-Noguer, Vittorio Ferrari

Lighthouse: Predicting Lighting Volumes for Spatially-Coherent Illumination
Pratul Srinivasan, Ben Mildenhall, Matthew Tancik, Jonathan T. Barron, Richard Tucker, Noah Snavely

Scale-space flow for end-to-end optimized video compression
Eirikur Agustsson, David Minnen, Nick Johnston, Johannes Ballé, Sung Jin Hwang, George Toderici

StructEdit: Learning Structural Shape Variations
Kaichun Mo, Paul Guerrero, Li Yi, Hao Su, Peter Wonka, Niloy Mitra, Leonidas Guibas

3D-MPA: Multi Proposal Aggregation for 3D Semantic Instance Segmentation
Francis Engelmann, Martin Bokeloh, Alireza Fathi, Bastian Leibe, Matthias Niessner

Sequential mastery of multiple tasks: Networks naturally learn to learn and forget to forget
Guy Davidson, Michael C. Mozer

Distilling Effective Supervision from Severe Label Noise
Zizhao Zhang, Han Zhang, Sercan Ö. Arik, Honglak Lee, Tomas Pfister

ViewAL: Active Learning With Viewpoint Entropy for Semantic Segmentation
Yawar Siddiqui, Julien Valentin, Matthias Niessner

Attribution in Scale and Space
Shawn Xu, Subhashini Venugopalan, Mukund Sundararajan

Weakly-Supervised Semantic Segmentation via Sub-category Exploration
Yu-Ting Chang, Qiaosong Wang, Wei-Chih Hung, Robinson Piramuthu, Yi-Hsuan Tsai, Ming-Hsuan Yang

Speech2Action: Cross-modal Supervision for Action Recognition
Arsha Nagrani, Chen Sun, David Ross, Rahul Sukthankar, Cordelia Schmid, Andrew Zisserman

Counting Out Time: Class Agnostic Video Repetition Counting in the Wild
Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman

The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction
Junwei Liang, Lu Jiang, Kevin Murphy, Ting Yu, Alexander Hauptmann

Self-training with Noisy Student improves ImageNet classification
Qizhe Xie, Minh-Thang Luong, Eduard Hovy, Quoc V. Le

EfficientDet: Scalable and Efficient Object Detection (see the blog post)
Mingxing Tan, Ruoming Pang, Quoc Le

ACNe: Attentive Context Normalization for Robust Permutation-Equivariant Learning
Weiwei Sun, Wei Jiang, Eduard Trulls, Andrea Tagliasacchi, Kwang Moo Yi

VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation
Jiyang Gao, Chen Sun, Hang Zhao, Yi Shen, Dragomir Anguelov, Cordelia Schmid, Congcong Li

SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization
Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc Le, Xiaodan Song

KeyPose: Multi-View 3D Labeling and Keypoint Estimation for Transparent Objects
Xingyu Liu, Rico Jonschkowski, Anelia Angelova, Kurt Konolige

Structured Multi-Hashing for Model Compression
Elad Eban, Yair Movshovitz-Attias, Hao Wu, Mark Sandler, Andrew Poon, Yerlan Idelbayev, Miguel A. Carreira-Perpinan

DOPS: Learning to Detect 3D Objects and Predict their 3D Shapes
Mahyar Najibi, Guangda Lai, Abhijit Kundu, Zhichao Lu, Vivek Rathod, Tom Funkhouser, Caroline Pantofaru, David Ross, Larry Davis, Alireza Fathi

Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation
Bowen Cheng, Maxwell Collins, Yukun Zhu, Ting Liu, Thomas S. Huang, Hartwig Adam, Liang-Chieh Chen

Context R-CNN: Long Term Temporal Context for Per-Camera Object Detection
Sara Beery, Guanhang Wu, Vivek Rathod, Ronny Votel, Jonathan Huang

Distortion Agnostic Deep Watermarking
Xiyang Luo, Ruohan Zhan, Huiwen Chang, Feng Yang, Peyman Milanfar

Can weight sharing outperform random architecture search? An investigation with TuNAS
Gabriel Bender, Hanxiao Liu, Bo Chen, Grace Chu, Shuyang Cheng, Pieter-Jan Kindermans, Quoc Le

GIFnets: Differentiable GIF Encoding Framework
Innfarn Yoo, Xiyang Luo, Yilin Wang, Feng Yang, Peyman Milanfar

Your Local GAN: Designing Two Dimensional Local Attention Mechanisms for Generative Models
Giannis Daras, Augustus Odena, Han Zhang, Alex Dimakis

Fast Sparse ConvNets
Erich Elsen, Marat Dukhan, Trevor Gale, Karen Simonyan

RetinaTrack: Online Single Stage Joint Detection and Tracking
Zhichao Lu, Vivek Rathod, Ronny Votel, Jonathan Huang

Learning to See Through Obstructions
Yu-Lun Liu, Wei-Sheng Lai, Ming-Hsuan Yang,Yung-Yu Chuang, Jia-Bin Huang

Self-Supervised Learning of Video-Induced Visual Invariances
Michael Tschannen, Josip Djolonga, Marvin Ritter, Aravindh Mahendran, Neil Houlsby, Sylvain Gelly, Mario Lucic

Workshops

3rd Workshop and Challenge on Learned Image Compression
Organizers include: George Toderici, Eirikur Agustsson, Lucas Theis, Johannes Ballé, Nick Johnston

CLVISION 1st Workshop on Continual Learning in Computer Vision
Organizers include: Zhiyuan (Brett) Chen, Marc Pickett

Embodied AI
Organizers include: Alexander Toshev, Jie Tan, Aleksandra Faust, Anelia Angelova

The 1st International Workshop and Prize Challenge on Agriculture-Vision: Challenges & Opportunities for Computer Vision in Agriculture
Organizers include: Zhen Li, Jim Yuan

Embodied AI
Organizers include: Alexander Toshev, Jie Tan, Aleksandra Faust, Anelia Angelova

New Trends in Image Restoration and Enhancement workshop and challenges on image and video restoration and enhancement (NTIRE)
Talk: “Sky Optimization: Semantically aware image processing of skies in low-light photography”
Orly Liba, Longqi Cai, Yun-Ta Tsai, Elad Eban, Yair Movshovitz-Attias, Yael Pritch, Huizhong Chen, Jonathan Barron

The End-of-End-to-End A Video Understanding Pentathlon
Organizers include: Rahul Sukthankar

4th Workshop on Media Forensics
Organizers include: Christoph Bregler

4th Workshop on Visual Understanding by Learning from Web Data
Organizers include: Jesse Berent, Rahul Sukthankar

AI for Content Creation
Organizers include: Deqing Sun, Lu Jiang, Weilong Yang

Fourth Workshop on Computer Vision for AR/VR
Organizers include: Sofien Bouaziz

Low-Power Computer Vision Competition (LPCVC)
Organizers include: Bo Chen, Andrew Howard, Jaeyoun Kim

Sight and Sound
Organizers include: William Freeman

Workshop on Efficient Deep Learning for Computer Vision
Organizers include: Pete Warden

Extreme classification in computer vision
Organizers include: Ramin Zabih, Zhen Li

Image Matching: Local Features and Beyond (see the blog post)
Organizers include: Eduard Trulls

The DAVIS Challenge on Video Object Segmentation
Organizers include: Alberto Montes, Jordi Pont-Tuset, Kevis-Kokitsi Maninis

2nd Workshop on Precognition: Seeing through the Future
Organizers include: Utsav Prabhu

Computational Cameras and Displays (CCD)
Talk: Orly Liba

2nd Workshop on Learning from Unlabeled Videos (LUV)
Organizers include:Honglak Lee, Rahul Sukthankar

7th Workshop on Fine Grained Visual Categorization (FGVC7) (see the blog post)
Organizers include: Christine Kaeser-Chen, Serge Belongie

Language & Vision with applications to Video Understanding
Organizers include: Lu Jiang

Neural Architecture Search and Beyond for Representation Learning
Organizers include: Barret Zoph

Tutorials

Disentangled 3D Representations for Relightable Performance Capture of Humans
Organizers include: Sean Fanello, Christoph Rhemann, Jonathan Taylor, Sofien Bouaziz, Adarsh Kowdle, Rohit Pandey, Sergio Orts-Escolano, Paul Debevec, Shahram Izadi

Learning Representations via Graph-Structured Networks
Organizers include:Chen Sun, Ming-Hsuan Yang

Novel View Synthesis: From Depth-Based Warping to Multi-Plane Images and Beyond
Organizers include:Varun Jampani

How to Write a Good Review
Talks by:Vittorio Ferrari, Bill Freeman, Jordi Pont-Tuset

Neural Rendering
Organizers include:Ricardo Martin-Brualla, Rohit K. Pandey, Sean Fanello,Maneesh Agrawala, Dan B. Goldman

Fairness Accountability Transparency and Ethics and Computer Vision
Organizers: Timnit Gebru, Emily Denton

Google at ICLR 2020

Posted by Christian Howard, Google Research

This week marks the beginning of the 8th International Conference on Learning Representations (ICLR 2020), a fully virtual conference focused on how one can learn meaningful and useful representations of data for machine learning. ICLR offers conference and workshop tracks, both of which include invited talks along with oral and poster presentations of some of the latest research on deep learning, metric learning, kernel learning, compositional models, non-linear structured prediction and issues regarding non-convex optimization.

As a Diamond Sponsor of ICLR 2020, Google will have a strong virtual presence with over 80 publications accepted, in addition to participating on organizing committees and in workshops. If you have registered for ICLR 20202, we hope you'll watch our talks and learn about the projects and opportunities at Google that go into solving interesting problems for billions of people. You can also learn more about our research being presented at ICLR 2020 in the list below (Googlers highlighted in blue).

Officers and Board Members
Includes: Hugo LaRochelle, Samy Bengio, Tara Sainath

Organizing Committee
Includes: Kevin Swersky, Timnit Gebru

Area Chairs
Includes: Balaji Lakshminarayanan, Been Kim, Chelsea Finn, Dale Schuurmans, George Tucker, Honglak Lee, Hossein Mobahi, Jasper Snoek, Justin Gilmer, Katherine Heller, Manaal Faruqui, Michael Ryoo, Nicolas Le Roux, Sanmi Koyejo, Sergey Levine, Tara Sainath, Yann Dauphin, Anders Søgaard, David Duvenaud, Jamie Morgenstern, Qiang Liu

Publications
SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference (see the blog post)
Lasse Espeholt, Raphaël Marinier, Piotr Stanczyk, Ke Wang, Marcin Michalski‎

Differentiable Reasoning Over a Virtual Knowledge Base
Bhuwan Dhingra, Manzil Zaheer, Vidhisha Balachandran, Graham Neubig, Ruslan Salakhutdinov, William W. Cohen

Dynamics-Aware Unsupervised Discovery of Skills
Archit Sharma, Shixiang Gu, Sergey Levine, Vikash Kumar, Karol Hausman

GenDICE: Generalized Offline Estimation of Stationary Values
Ruiyi Zhang, Bo Dai, Lihong Li, Dale Schuurmans

Mathematical Reasoning in Latent Space
Dennis Lee, Christian Szegedy, Markus N. Rabe, Kshitij Bansal, Sarah M. Loos

Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One
Will Grathwohl, Kuan-Chieh Wang, Jorn-Henrik Jacobsen, David Duvenaud, Kevin Swersky, Mohammad Norouzi

Adjustable Real-time Style Transfer
Mohammad Babaeizadeh, Golnaz Ghiasi

Are Transformers Universal Approximators of Sequence-to-sequence Functions?
Chulhee Yun, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashankc J. Reddi, Sanjiv Kumar

AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures
Michael S. Ryoo, AJ Piergiovanni, Mingxing Tan, Anelia Angelova

AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty
Dan Hendrycks, Norman Mu, Ekin D. Cubuk, Barret Zoph, Justin Gilmer, Balaji Lakshminarayanan

BatchEnsemble: an Alternative Approach to Efficient Ensemble and Lifelong Learning
Yeming Wen, Dustin Tran, Jimmy Ba

Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning (see the blog post)
Ali Mousavi, Lihong Li, Qiang Liu, Dengyong Zhou

Can Gradient Clipping Mitigate Label Noise?
Aditya Krishna Menon, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar

CAQL: Continuous Action Q-Learning
Moonkyung Ryu, Yinlam Chow, Ross Anderson, Christian Tjandraatmadja, Craig Boutilier

Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation
Byung Hoon Ahn, Prannoy Pilligundla, Amir Yazdanbakhsh, Hadi Esmaeilzadeh

Coherent Gradients: An Approach to Understanding Generalization in Gradient Descent-based Optimization
Satrajit Chatterjee

Consistency Regularization for Generative Adversarial Networks
Han Zhang, Zizhao Zhang, Augustus Odena, Honglak Lee

Contrastive Representation Distillation
Yonglong Tian, Dilip Krishnan, Phillip Isola

Deep Audio Priors Emerge from Harmonic Convolutional Networks
Zhoutong Zhang, Yunyun Wang, Chuang Gan, Jiajun Wu, Joshua B. Tenenbaum, Antonio Torralba, William T. Freeman

Detecting and Diagnosing Adversarial Images with Class-Conditional Capsule Reconstructions
Yao Qin, Nicholas Frosst, Sara Sabour, Colin Raffel, Garrison Cottrell, Geoffrey Hinton

Detecting Extrapolation with Local Ensembles
David Madras, James Atwood, Alexander D'Amour

Disentangling Factors of Variations Using Few Labels
Francesco Locatello, Michael Tschannen, Stefan Bauer, Gunnar Rätsch, Bernhard Schölkopf, Olivier Bachem

Distance-Based Learning from Errors for Confidence Calibration
Chen Xing, Sercan Ö. Arik, Zizhao Zhang, Tomas Pfister

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators (see the blog post)
Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning

ES-MAML: Simple Hessian-Free Meta Learning (see the blog post)
Xingyou Song, Yuxiang Yang, Krzysztof Choromanski, Aldo Pacchiano, Wenbo Gao, Yunhao Tang

Exploration in Reinforcement Learning with Deep Covering Options
Yuu Jinnai, Jee Won Park, Marlos C. Machado, George Konidaris

Extreme Tensoring for Low-Memory Preconditioning
Xinyi Chen, Naman Agarwal, Elad Hazan, Cyril Zhang, Yi Zhang

Fantastic Generalization Measures and Where to Find Them
Yiding Jiang, Behnam Neyshabur, Hossein Mobahi, Dilip Krishnan, Samy Bengio

Generalization Bounds for Deep Convolutional Neural Networks
Philip M. Long, Hanie Sedghi

Generalized Convolutional Forest Networks for Domain Generalization and Visual Recognition
Jongbin Ryu, GiTaek Kwon, Ming-Hsuan Yang, Jongwoo Lim

Generative Models for Effective ML on Private, Decentralized Datasets
Sean Augenstein, H. Brendan McMahan, Daniel Ramage, Swaroop Ramaswamy, Peter Kairouz, Mingqing Chen, Rajiv Mathews, Blaise Aguera y Arcas

Generative Ratio Matching Networks
Akash Srivastava, Kai Xu, Michael U. Gutmann, Charles Sutton

Global Relational Models of Source Code
Vincent J. Hellendoorn, Petros Maniatis, Rishabh Singh, Charles Sutton, David Bieber

Hierarchical Foresight: Self-Supervised Learning of Long-Horizon Tasks via Visual Subgoal Generation
Suraj Nair, Chelsea Finn

Identity Crisis: Memorization and Generalization Under Extreme Overparameterization
Chiyuan Zhang, Samy Bengio, Moritz Hardt, Michael C. Mozer, Yoram Singer

Imitation Learning via Off-Policy Distribution Matching
Ilya Kostrikov, Ofir Nachum, Jonathan Tompson

Language GANs Falling Short
Massimo Caccia, Lucas Caccia, William Fedus, Hugo Larochelle, Joëlle Pineau, Laurent Charlin

Large Batch Optimization for Deep Learning: Training BERT in 76 Minutes
Yang You, Jing Li, Sashank Reddi, Jonathan Hseu, Sanjiv Kumar, Srinadh Bhojanapalli, Xiaodan Song, James Demmel, Kurt Keutzer, Cho-Jui Hsieh

Learning Execution through Neural Code Fusion
Zhan Shi, Kevin Swersky, Daniel Tarlow, Parthasarathy Ranganathan, Milad Hashemi

Learning Heuristics for Quantified Boolean Formulas through Reinforcement Learning
Gil Lederman, Markus N. Rabe, Edward A. Lee, Sanjit A. Seshia

Learning to Learn by Zeroth-Order Oracle
Yangjun Ruan, Yuanhao Xiong, Sashank Reddi, Sanjiv Kumar, Cho-Jui Hsieh

Learning to Represent Programs with Property Signatures
Augustus Odena, Charles Sutton

MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius
Runtian Zhai, Chen Dan, Di He, Huan Zhang, Boqing Gong, Pradeep Ravikumar, Cho-Jui Hsieh, Liwei Wang

Measuring Compositional Generalization: A Comprehensive Method on Realistic Data
Daniel Keysers, Nathanael Schärli, Nathan Scales, Hylke Buisman, Daniel Furrer, Sergii Kashubin, Nikola Momchev, Danila Sinopalnikov, Lukasz Stafiniak, Tibor Tihon, Dmitry Tsarkov, Xiao Wang, Marc van Zee, Olivier Bousquet

Meta Reinforcement Learning with Autonomous Inference of Subtask Dependencies
Sungryull Sohn, Hyunjae Woo, Jongwook Choi, Honglak Lee

Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples
Eleni Triantafillou, Tyler Zhu, Vincent Dumoulin, Pascal Lamblin, Utku Evci, Kelvin Xu, Ross Goroshin, Carles Gelada, Kevin Swersky, Pierre-Antoine Manzagol, Hugo Larochelle

Model-based Reinforcement Learning for Biological Sequence Design
Christof Angermueller, David Dohan, David Belanger, Ramya Deshpande, Kevin Murphy, Lucy Colwell

Network Randomization: A Simple Technique for Generalization in Deep Reinforcement Learning
Kimin Lee, Kibok Lee, Jinwoo Shin, Honglak Lee

Observational Overfitting in Reinforcement Learning
Xingyou Song, Yiding Jiang, Stephen Tu, Behnam Neyshabur, Yilun Du

On Bonus-based Exploration Methods In The Arcade Learning Environment
Adrien Ali Taiga, William Fedus, Marlos C. Machado, Aaron Courville, Marc G. Bellemare

On Identifiability in Transformers
Gino Brunner, Yang Liu, Damian Pascual, Oliver Richter, Massimiliano Ciaramita, Roger Wattenhofer

On Mutual Information Maximization for Representation Learning
Michael Tschannen, Josip Djolonga, Paul K. Rubenstein, Sylvain Gelly, Mario Lucic

On the Global Convergence of Training Deep Linear ResNets
Difan Zou, Philip M. Long, Quanquan Gu

Phase Transitions for the Information Bottleneck in Representation Learning
Tailin Wu, Ian Fischer

Pre-training Tasks for Embedding-based Large-scale Retrieval
Wei-Cheng Chang, Felix X. Yu, Yin-Wen Chang, Yiming Yang, Sanjiv Kumar

Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control
Nir Levine, Yinlam Chow, Rui Shu, Ang Li, Mohammad Ghavamzadeh, Hung Bui

Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks
Wei Hu, Lechao Xiao, Jeffrey Pennington

Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML
Aniruddh Raghu, Maithra Raghu, Samy Bengio, Oriol Vinyals

Reinforced Genetic Algorithm Learning for Optimizing Computation Graphs
Aditya Paliwal, Felix Gimeno, Vinod Nair, Yujia Li, Miles Lubin, Pushmeet Kohli, Oriol Vinyals

ReMixMatch: Semi-Supervised Learning with Distribution Alignment and Augmentation Anchoring
David Berthelot, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Han Zhang, Colin Raffel, Kihyuk Sohn

Scalable Model Compression by Entropy Penalized Reparameterization
Deniz Oktay, Johannes Ballé, Saurabh Singh, Abhinav Shrivastava

Scalable Neural Methods for Reasoning With a Symbolic Knowledge Base
William W. Cohen, Haitian Sun, R. Alex Hofer, Matthew Siegler

Semi-Supervised Generative Modeling for Controllable Speech Synthesis
Raza Habib, Soroosh Mariooryad, Matt Shannon, Eric Battenberg, RJ Skerry-Ryan, Daisy Stanton, David Kao, Tom Bagby

Span Recovery for Deep Neural Networks with Applications to Input Obfuscation
Rajesh Jayaram, David Woodruff, Qiuyi Zhang

Thieves on Sesame Street! Model Extraction of BERT-based APIs
Kalpesh Krishna, Gaurav Singh Tomar, Ankur P. Parikh, Nicolas Papernot, Mohit Iyyer

Thinking While Moving: Deep Reinforcement Learning with Concurrent Control
Ted Xiao, Eric Jang, Dmitry Kalashnikov, Sergey Levine, Julian Ibarz, Karol Hausman, Alexander Herzog

VideoFlow: A Conditional Flow-Based Model for Stochastic Video Generation
Manoj Kumar, Mohammad Babaeizadeh, Dumitru Erhan, Chelsea Finn, Sergey Levine, Laurent Dinh, Durk Kingma

Watch, Try, Learn: Meta-Learning from Demonstrations and Rewards
Allan Zhou, Eric Jang, Daniel Kappler, Alex Herzog, Mohi Khansari, Paul Wohlhart, Yunfei Bai, Mrinal Kalakrishnan, Sergey Levine, Chelsea Finn

Weakly Supervised Disentanglement with Guarantees
Rui Shu, Yining Chen, Abhishek Kumar, Stefano Ermon, Ben Poole

You Only Train Once: Loss-Conditional Training of Deep Networks
Alexey Dosovitskiy, Josip Djolonga

A Mutual Information Maximization Perspective of Language Representation Learning
Lingpeng Kong, Cyprien de Masson d’Autume, Wang Ling, Lei Yu, Zihang Dai, Dani Yogatama

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations (see the blog post)
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut

Asymptotics of Wide Networks from Feynman Diagrams
Ethan Dyer, Guy Gur-Ari

DDSP: Differentiable Digital Signal Processing
Jesse Engel, Lamtharn Hantrakul, Chenjie Gu, Adam Roberts

Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation
Ziyang Tang, Yihao Feng, Lihong Li, Dengyong Zhou, Qiang Liu

Dream to Control: Learning Behaviors by Latent Imagination (see the blog post)
Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi

Emergent Tool Use From Multi-Agent Autocurricula
Bowen Baker, Ingmar Kanitscheider, Todor Markov, Yi Wu, Glenn Powell, Bob McGrew, Igor Mordatch

Gradientless Descent: High-Dimensional Zeroth-Order Optimization
Daniel Golovin, John Karro, Greg Kochanski, Chansoo Lee, Xingyou Song, Qiuyi (Richard) Zhang

HOPPITY: Learning Graph Transformations to Detect and Fix Bugs in Programs
Elizabeth Dinella, Hanjun Dai, Ziyang Li, Mayur Naik, Le Song, Ke Wang

Learning to Plan in High Dimensions via Neural Exploration-Exploitation Trees
Binghong Chen, Bo Dai, Qinjie Lin, Guo Ye, Han Liu, Le Song

Model Based Reinforcement Learning for Atari (see the blog post)
Łukasz Kaiser, Mohammad Babaeizadeh, Piotr Miłos, Błazej Osinski, Roy H. Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, Henryk Michalewski

Neural Symbolic Reader: Scalable Integration of Distributed and Symbolic Representations for Reading Comprehension
Xinyun Chen, Chen Liang, Adams Wei Yu, Denny Zhou, Dawn Song, Quoc V. Le

SUMO: Unbiased Estimation of Log Marginal Probability for Latent Variable Models
Yucen Luo, Alex Beatson, Mohammad Norouzi, Jun Zhu, David Duvenaud, Ryan P. Adams, Ricky T. Q. Chen

Measuring the Reliability of Reinforcement Learning Algorithms
Stephanie C.Y. Chan, Samuel Fishman, John Canny, Anoop Korattikara, Sergio Guadarrama

Meta-Learning without Memorization
Mingzhang Yin, George Tucker, Mingyuan Zhou, Sergey Levine, Chelsea Finn

Neural Tangents: Fast and Easy Infinite Neural Networks in Python (see the blog post)
Roman Novak, Lechao Xiao, Jiri Hron, Jaehoon Lee, Alexander A. Alemi, Jascha Sohl-Dickstein, Samuel S. Schoenholz

Scaling Autoregressive Video Models
Dirk Weissenborn, Oscar Täckström, Jakob Uszkoreit

The Intriguing Role of Module Criticality in the Generalization of Deep Networks
Niladri Chatterji, Behnam Neyshabur, Hanie Sedghi

Reformer: The Efficient Transformer (see the blog post)
Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya

Workshops
Computer Vision for Global Challenges
Organizing Committee: Ernest Mwebaze
Advisory Committee: Timnit Gebru, John Quinn

Practical ML for Developing Countries: Learning under limited/low resource scenarios
Organizing Committee: Nyalleng Moorosi, Timnit Gebru
Program Committee: Pablo Samuel Castro, Samy Bengio
Keynote Speaker: Karmel Allison

Tackling Climate Change with Machine Learning
Organizing Committee: Moustapha Cisse
Co-Organizer: Natasha Jaques
Program Committee: John C. Platt, Kevin McCloskey, Natasha Jaques
Advisor and Panel: John C. Platt

Towards Trustworthy ML: Rethinking Security and Privacy for ML
Organizing Committee: Nicholas Carlini, Nicolas Papernot
Program Committee: Shuang Song

Google Research: Looking Back at 2019, and Forward to 2020 and Beyond

Posted by Jeff Dean, Senior Fellow and SVP of Google Research and Health, on behalf of the entire Google Research community

The goal of Google Research is to work on long-term, ambitious problems, with an emphasis on solving ones that will dramatically help people throughout their daily lives. In pursuit of that goal in 2019, we made advances in a broad set of fundamental research areas, applied our research to new and emerging areas such as healthcare and robotics, open sourced a wide variety of code and continued collaborations with Google product teams to build tools and services that are dramatically more helpful for our users.

As we start 2020, it’s useful to take a step back and assess the research work we’ve done over the past year, and also to look forward to what sorts of problems we want to tackle in the upcoming years. In that spirit, this blog post is a survey of some of the research-focused work done by Google researchers and engineers during 2019 (in the spirit of similar reviews for 2018, and more narrowly focused reviews of some work in 2017 and 2016). For a more comprehensive look, please see our research publications in 2019.

Ethical Use of AI
In 2018, we published a set of AI Principles that provide a framework by which we evaluate our own research and applications of technologies such as machine learning in our products. In June 2019, we published a one-year update about how these principles are being put into practice in many different aspects of our research and product development life cycles. Since many of the areas touched on by the principles are active areas of research in the broader AI and machine learning research community (such as bias, safety, fairness, accountability, transparency and privacy in machine learning systems), our goals are to apply the best currently-known techniques in these areas to our work, and also to do research to continue to advance the state of the art in these important areas.

For example, this year we:

Published a research paper about a new transparency tool, which enabled the launch of Model Cards for several of our Cloud AI products. You can see an example model card for the Cloud AI Vision API Object Detection feature.
Showed how Activation Atlases can help explore neural network behavior and can aid with interpretability of machine learning models.
Introduced TensorFlow Privacy, an open-source library to enable training machine learning models with differential privacy guarantees.
Released a beta version of Fairness Indicators, to help ML practitioners identify unjust or unintended impacts of machine learning models.
Clicking on a slice in Fairness Indicators will load all the data points in that slice inside the What-If Tool widget. In this case, all data points with the “female” label are shown.
Published a KDD'19 paper on how pairwise comparisons and regularization is incorporated into a large-scale production recommender system to improve ML Fairness.
Published an AIES'19 paper about a case study on the application of fairness in machine learning research to a production classification system, and described our fairness metric, conditional equality, that takes into account distributional differences in implementing equality of opportunity.
Published an AIES'19 paper about counterfactual fairness in text classification problems that asks the question: "How would the prediction change if the sensitive attribute referenced in the example were different?" and used this approach to improve our production systems that assess the toxicity of online content.

Released a new dataset to help with research to identify deepfakes.

A sample of videos from Google’s contribution to the FaceForensics benchmark. To generate these, pairs of actors were selected randomly, and deep neural networks swapped the face of one actor onto the head of another.

AI for Social Good
There is enormous potential for machine learning to help with many important societal issues. We have been doing work in several such areas, as well as working to enable others to apply their creativity and skills to solving such problems. Floods are the most common and the most deadly natural disaster on the planet, affecting approximately 250 million people each year. We have been using machine learning, computation and better sources of data to make significantly more accurate flood forecasts, and then to deliver actionable alerts to the phones of millions of people in the affected regions. We also hosted a workshop that brought together researchers with expertise in flood forecasting, hydrology and machine learning from Google and the broader research community to discuss ways to collaborate further on this important problem.

In addition to our flood forecasting efforts, we’ve been developing techniques to better understand the world’s wildlife, collaborating with seven wildlife conservation organizations to use machine learning to help analyze wildlife camera data and collaborating with the U.S. NOAA to identify whale species and locations from sounds in underwater recordings. We’ve also created and released a set of tools for enabling new kinds of machine-learning-oriented biodiversity research. As part of helping to organize the 6th Fine-Grained Visual Categorization Workshop, Google researchers in our Accra, Ghana office collaborated with researchers at Makerere University AI & Data Science research group to create and run a Kaggle competition on the classification of cassava plant diseases. As cassava is the second largest source of carbohydrates in Africa, plant health is an important food security issue, and it was great to see more than 100 participants across 87 teams participate in the contest.

In 2019 we updated Google Earth Timelapse, enabling people to effectively and intuitively visualize how the planet has changed over the past 35 years. Further, we’ve been collaborating with academic researchers on new privacy-preserving ways to aggregate data on human mobility, to give urban planners better information about how to design efficient environments with lower levels of carbon emissions.

We’ve also applied machine learning to support childhood learning. According to the United Nations, 617 million children do not have basic literacy, a critical determinant of their quality of life. To help more children learn to read, our Bolo app uses speech-recognition technology that tutors students in real-time. And to increase access, the app works completely offline on low-cost phones. In India, Bolo has already helped 800,000 children read stories and speak half a billion words. Early results are encouraging; a three-month pilot among 200 villages in India showed an improvement in reading proficiency among 64% of pilot participants.

For older students, the Socratic app can help high schoolers with complex problems in math, physics and over 1,000 higher education topics. Based on a photo or verbal question, the app automatically identifies the question’s underlying concepts and links to the most helpful online resources. Like the Socratic method, the app doesn’t directly answer questions, but instead leads students to discover the answer themselves. We’re excited about the broad possibilities of improving educational outcomes around the world through things like Bolo and Socratic.

To expand the reach of our AI for Social Good efforts, in May we announced the grantees of our AI Impact Challenge with $25 million in grants from Google.org. The response was huge: we received over 2,600 thoughtful proposals from 119 countries. Twenty impressive organizations stood out for their potential to solve big social and environmental problems and were our initial set of grantees. A few examples of the work of these organizations:

The Fondation Médecins Sans Frontières (MSF) is creating a free smartphone application that uses image recognition tools to help clinical staff in low-resource settings (currently being piloted in Jordan) to analyze anti-microbial images and advise on the appropriate antibiotics to use for a particular patient’s infection.
Over a billion people live in smallholder farm households. A single pest attack can devastate their crop yields and livelihoods. Wadhwani AI uses image classification models that can identify pests and provide timely advice on what pesticides to spray and when—ultimately improving crop yield.
And deep in tropical rainforests, where illegal deforestation is a major driver of climate change, Rainforest Connection uses deep learning for bioacoustic monitoring and old cell phones to track rainforest health and detect threats.
Our 20 AI Impact Challenge winners. You can learn more about the work of all the grantees here.

Applications of AI to Other Fields
The application of computer science and machine learning to other scientific fields is an area that we are especially excited about and have published a number of papers in, often in multi-organization collaborations. Some highlights from this year include:

In An Interactive, Automated 3D Reconstruction of a Fly Brain, we reported on a collaborative effort that achieved a milestone of mapping the structure of an entire fly brain, using machine learning models that were able to painstakingly trace each individual neuron.

In Learning Better Simulation Methods for Partial Differential Equations (PDEs), we showed how machine learning can be used to accelerate PDE computations, which are at the heart of many fundamental computational problems in climate science, fluid dynamics, electromagnetism, heat conduction and general relativity.

Simulations of Burgers’ equation, a model for shock waves in fluids, solved with either a standard finite volume method (left) or our neural network based method (right). The orange squares represent simulations with each method on low resolution grids. These points are fed back into the model at each time step, which then predicts how they should change. Blue lines show the exact simulations used for training. The neural network solution is much better, even on a 4x coarser grid, as indicated by the orange squares smoothly tracing the blue line.

We gave machine learning models better scents of the world with Learning to Smell: Using Deep Learning to Predict the Olfactory Properties of Molecules. We showed how to leverage graph neural networks (GNNs) to directly predict the odor descriptors for individual molecules, without using any handcrafted rules.

2D snapshot of our embedding space with some example odors highlighted. Left: Each odor is clustered in its own space. Right: The hierarchical nature of the odor descriptor. Shaded and contoured areas are computed with a kernel-density estimate of the embeddings.

In work that combines chemistry and reinforcement learning techniques, we presented a framework for molecule optimization.
Machine learning can also help us in our artistic and creative endeavors. Artists have found ways to collaborate with AI and AR and create interesting new forms, from dancing with a machine to reimagine choreography, to creating new melodies with machine learning tools. ML can be used by novices, too. To honor the birthday of J.S. Bach, we featured a ML-powered Doodle: just create your melody, and the ML tool can create accompanying harmonizations in Bach’s style.

Assistive Technology
On a more personal scale, ML can help us in our daily lives. It’s easy to take for granted our ability to see a beautiful image, to hear a favorite song, or to speak with a loved one. Yet over one billion people aren’t able to access the world in these ways. ML technology can help by turning these signals—vision, hearing, speech—into other signals that can be well-managed by people with accessibility needs, enabling better access to the world around them. A few examples of our assistive technology:

Lookout helps people who are blind or have low vision identify information about their surroundings. It draws upon similar underlying technology as Google Lens, which lets you search and take action on the objects around you, simply by pointing your phone.
Live Transcribe has the potential to give people who are deaf or hard of hearing greater independence in their everyday interactions. You can get real-time transcriptions of conversations that the user is engaged in, even if the speech is in another language.
Project Euphonia performs personalized speech-to-text transcription. For people with ALS and other conditions that produce slurred or non-standard speech, this research improves automatic speech recognition (ASR) over other state-of-the-art ASR models.
Like Project Euphonia, Parrotron uses end-to-end neural networks to help improve communication, but the research focuses on automatic speech-to-speech conversion rather than transcription, presenting a speech interface that may be easier for some to access.
Millions of images online don’t have any text description. Get Image Descriptions from Google helps blind or low vision users understand unlabelled images. When a screen reader encounters an image or graphic without a description, Chrome can now create one automatically.
We developed tools that can read visual text in audio form in Lens for Google Go, greatly helping users who are not fully literate navigate the word-rich world around them.

Making Your Phone More Intelligent
Much of our work serves to enable intelligent, personal devices by giving mobile phones new capabilities through the use of on-device machine learning. By making powerful models that can run on-device, we can ensure that these phone features are highly responsive and always available even in airplane mode or otherwise off the network. We’ve made progress in getting highly accurate speech recognition models, vision models and handwriting recognition models all running on-device, paving the way for powerful new features. Some of this year’s highlights include:

The launch of on-device captioning with Live Caption, giving always-available transcription of any video playing on your device.
The creation of a powerful new transcribing Recorder app, which can help index audio information and make it easily retrievable.
Improvements to Google Translate’s camera translation, so that you can point at text in an unfamiliar language and get it instantly translated in context.
Release of the Augmented Faces API in ARCore, enabling new real-time AR self-expression tools.
A demonstration of on-device, real-time hand tracking, enabling new ways for users to interact with and control their devices with their hands.
Improved, RNN-based on-device handwriting recognition for on-screen mobile keyboards.
The release of a new global localization approach using your smart phone’s camera to help more accurately orient you and help you find your way in the world.

Federated learning (check out the online comic description!) is a powerful machine learning approach invented by Google researchers in 2015, whereby many clients (such as mobile devices or whole organizations) collaboratively train a model, while keeping the training data decentralized. This enables approaches that have superior privacy properties in large-scale learning systems. We are using federated learning in more and more of our products and features, while also working to advance the state of the art in many research problems in this space. In 2019, Google researchers collaborated with authors from 24 (!) academic institutions to produce a survey article on Federated Learning, highlighting advances over the past few years as well describing a number of open research problems in the field.

The field of computational photography has led to great advances in the image quality of phone cameras over the past few years, and this year was no exception. This year, we made it easier to take great selfies, to take professional-looking shallow depth of field images and portraits and to use the Night Sight feature on Pixel Phones to take some stunning astrophotography pictures. More technical details about this work can be found in papers on multi-frame super resolution and mobile photography in very low-light conditions. All of this work helps enable you to take great pictures to remember life’s magical moments as they happen.

Health
In late 2018, we combined the Google Research health team, Deepmind Health and a team from Google’s Hardware division focused on health-related applications to form Google Health. In 2019 we continued the research we’ve been pursuing in this space, publishing research papers and building tools in collaboration with a variety of healthcare partners. Here are a few of the highlights from 2019:

We showed that a deep learning model for mammography can assist physicians in spotting breast cancer, a condition that affects 1 in 8 women in the US during their lifetimes, with greater accuracy than experts, reducing both false positives and false negatives. The model trained on de-identified data from a UK hospital had similar gains in accuracy when used to evaluate patients in a completely different healthcare system in the U.S.
Example of a difficult-to-detect cancer case correctly identified by machine learning.
We showed that a deep learning model for differential diagnoses of skin diseases can give results that are significantly more accurate than primary care physicians and on par with or perhaps slightly better than dermatologists.
Working alongside experts from the US Department of Veterans Affairs (VA), DeepMind Health colleagues who are now part of Google Health showed that a machine learning model can predict the onset of acute kidney injury (AKI), one of the leading causes of avoidable patient harm, up to two days before it happens. In the future, this could give doctors a 48-hour head start in treating this serious condition.
We expanded the application of deep learning to electronic health records with several partner organizations. You can read more about this work in our 2018 blog post.
We showed a promising step forward for predicting lung cancer, where a deep learning model for examining the results of a single CT scan study performed on par or better than trained radiologists at early detection of lung cancer. Early detection of lung cancer dramatically improves survival rates.
We continued to expand and evaluate our deployment of machine learning tools for detection and prevention of eye disease, in collaboration with Verily and our healthcare partners in India and Thailand.
We published a research paper on an augmented reality microscope for cancer diagnosis, whereby a pathologist can get real-time feedback about what parts of a slide are most interesting while examining tissue through a microscope. You can also read more about it in our 2018 blog post here.
We built a human-centric, similar-image search tool for pathologists to help them make more effective diagnoses, by allowing examination of similar cases.

Quantum Computing
In 2019, our quantum computing team demonstrated for the first time a computational task that can be executed exponentially faster on a quantum processor than on the world’s fastest classical computer — just 200 seconds compared to 10,000 years.

Left: Artist's rendition of the Sycamore processor mounted in the cryostat. (Full Res Version; Forest Stearns, Google AI Quantum Artist in Residence) Right: Photograph of the Sycamore processor. (Full Res Version; Erik Lucero, Research Scientist and Lead Production Quantum Hardware)

Using quantum computers may make important problems in domains like materials science, quantum chemistry (early example) and large-scale optimization tractable, but in order to make this a reality, we’ll have to continue to push the field forward. We are now focusing on implementing quantum error correction so that we will be able to run computations for longer. We are also working on making quantum algorithms easier to express, the hardware easier to control and we have found ways to use classical machine learning techniques like deep reinforcement learning to build more reliable quantum processors. The achievements this year are encouraging and are early steps along the way to making practical quantum computing a reality for a wider variety of problems.

You can also read Sundar’s thoughts on what our quantum computing milestone means.

General Algorithms and Theory
In the general areas of algorithms and theory, we continued our research from algorithmic foundations to applications, and also did work in graph mining and market algorithms. A blog post summarizing some of our work in graph learning algorithms gives more details about that work.

We published a paper at VLDB’19 titled "Cache-aware load balancing of data center applications," although an alternative title could be "Increase the serving capacity of your data center by 40% with this one cool trick!". The paper describes how we used balanced partitioning of graphs to specialize the caches in our web search backend serving system, thereby increasing the query throughput of our flash drives by 48%, and helping to enable a 40% increase in the throughput of the entire search backend.

Heatmap of flash IO requests (resulting from cache misses) across web search serving leaves. The three humps represent random leaf selection, load balancing, and cache-aware load balancing (left to right). Lines indicate the 50th, 90th, 95th and 99.9th percentiles. From VLDB’19 paper, "Cache-aware load balancing of data center applications."

In an ICLR’2019 paper titled "A new dog learns old tricks: RL finds classic optimization algorithms," we discovered a new connection between algorithms and machine learning, showing how Reinforcement Learning can effectively find optimal (worst-case, uniform) algorithms for several classic online optimization combinatorial problems such as online matching and allocation.

Our work in scalable algorithms spans both parallel, online and distributed algorithms for big data sets. In a recent FOCS’19 paper, we provided a near-optimal massively parallel computation algorithm for connected components. Another set of our papers improved parallel algorithms for matching (in theory and practice) and for density clustering. And a third line of work concerned adaptively optimizing submodular functions in the black-box model, which has several applications in feature selection and vocabulary compression. In a SODA’19 paper, we presented a submodular maximization algorithm that is nearly optimal in three aspects: approximation factor, round complexity, and query complexity. Also, in another FOCS 2019 paper, we provide the first online multiplicative approximation algorithm for PCA and Column Subset selection.

In other work, we introduce the semi-online model of computation that postulates that the unknown future has a predictable part and an adversarial part. For classical combinatorial problems such as bipartite matching (ITCS’19) and caching (SODA’20), we obtained semi-online algorithms to provide guarantees that smoothly interpolate between the best possible online and offline algorithms.

Our recent research in the area of market algorithms includes new understanding of the interaction between learning and markets, and innovations in experimental design. For example, this NeurIPS’19 oral paper reveals the surprising competitive advantage that a strategic agent has when competing with a learning agent in a general repeated 2-player game. Recent focus on advertising automation has produced increased interest in automated bidding and understanding response behavior of advertisers. In a pair of WINE 2019 papers, we study optimal strategy to maximize conversions on behalf of advertisers and further learn advertiser response behavior for any changes in the auction. Finally, we studied experimental design in the presence of interference where the treatment of one group may affect the outcomes of others. In a KDD'19 paper and a NeurIPS'19 paper, we show how to define units or clusters of units to limit interference while maintaining experimental power.

The clustering algorithm from the KDD’19 paper “Randomized Experimental Design via Geographic Clustering“ applied to user queries from the United States. The algorithm automatically identifies metropolitan areas, correctly predicting, for example, that the Bay Area includes San Francisco, Berkeley, and Palo Alto, but not Sacramento.

Machine Learning Algorithms
In 2019, we conducted research in many different areas of machine learning algorithms and approaches. One major focus was in understanding the properties of training dynamics in neural networks. In the blog post Measuring the Limits of Data Parallel Training for Neural Networks highlighting this paper, Google researchers presented a careful set of experimental results showing when scaling the amount of data parallelism (by making larger batches) is effective for allowing the model to converge faster (using data parallelism).

For all workloads we tested, we observed a universal relationship between batch size and training speed with three distinct regimes: perfect scaling with small batch sizes (following the dashed line), eventually seeing diminishing returns as batch sizes grow (diverging from the dashed line), and maximal data parallelism at the largest batch sizes (where the trend plateaus). The transition points between the regimes vary dramatically between different workloads.

Model parallelism, in contrast to data parallelism, where a model is spread out across multiple computational devices, can be an effective way of scaling models. GPipe is a library that enables model parallelism to be more effective, in an approach similar to that used by pipelined CPU processors: when one part of the whole model is working on some of the data, other parts can be working on their part of the computation on different data. The results of this pipeline approach can be combined together to simulate a larger effective batch size.

Machine learning models are effective when they’re able to take raw input data and learn “disentangled” higher-level representations that separate different kinds of examples by properties that we want the model to be able to distinguish (cat vs. truck vs. wildebeest, cancerous tissue vs. normal tissue, etc.). Much of the focus on advancing machine learning algorithms is to encourage the learning of better representations that generalize better to new examples, problems or domains. This year, we looked at this problem in a number of different contexts:

In Evaluating the Unsupervised Learning of Disentangled Representations, we examined what properties affect the representations that are learned from unsupervised data, in order to better understand what makes for good representations and effective learning.
In Predicting the Generalization Gap in Deep Neural Networks, we showed that it is possible to predict the generalization gap (the gap between a model’s performance on data from the training distribution versus data drawn from a different distribution) using statistics of the margin distribution, helping us better understand which models generalize most effectively. We also did some research on Improving Out-of-Distribution Detection in Machine Learning Models, to better understand when a model is starting to encounter kinds of data it has never seen before. We also looked at Off-Policy Classification in the context of reinforcement learning, to better understand which models are likely to generalize the best.

In Learning to Generalize from Sparse and Underspecified Rewards, we also examined ways of specifying reward functions for reinforcement learning that enable learning systems to more directly learn from true objectives and be less distracted with longer, less-desirable sequences of actions that happen to achieve desired goals by accident.

In this instruction-following task, the action trajectories a₁, a₂ and a₃ reach the goal, but the sequences a₂ and a₃ do not follow the instructions. This illustrates the issue of underspecified rewards.

AutoML
We continued our work on AutoML this year, an approach whereby algorithms that learn how to learn can automate many aspects of machine learning and often can achieve substantially better results than the best human machine learning experts for certain kinds of machine learning meta-decisions. In particular:

In EfficientNet: Improving Accuracy and Efficiency through AutoML and Model Scaling, we showed how to use neural architecture search techniques to achieve substantially better results on computer vision problems, including a new state-of-the-art result of 84.4% top-1 accuracy on ImageNet while having 8X fewer parameters than the previous best model.

Model Size vs. Accuracy Comparison. EfficientNet-B0 is the baseline network developed by AutoML MNAS, while Efficient-B1 to B7 are obtained by scaling up the baseline network. In particular, our EfficientNet-B7 achieves new state-of-the-art 84.4% top-1 / 97.1% top-5 accuracy, while being 8.4x smaller than the best existing CNN.

In EfficientNet-EdgeTPU: Creating Accelerator-Optimized Neural Networks with AutoML, we showed how a neural architecture search approach can find efficient models that are tailored to particular hardware accelerators, resulting in high accuracy, low-computational models for running on mobile devices.

In Video Architecture Search, we describe how we extended our AutoML work to the domain of video models, finding architectures that achieve state-of-the-art results, and also lightweight architectures that match the performance of hand-crafted models while using 50x less computation.

TinyVideoNet (TVN) architectures evolved to maximize the recognition performance while keeping its computation time within the desired limit. For instance, TVN-1 (top) runs at 37 ms on a CPU and 10ms on a GPU. TVN-2 (bottom) runs at 65ms on a CPU and 13ms on a GPU.

We developed AutoML techniques for tabular data, unlocking an important domain where many companies and organizations have interesting data in relational databases, and often want to develop machine learning models on this data. We collaborated to release this technology as a new Google Cloud AutoML Tables product, and also discussed how well this system did in a new Kaggle competition in An End-to-End AutoML Solution for Tabular Data at KaggleDays (spoiler: AutoML Tables finished second out of 74 teams of expert data scientists).
In Exploring Weight Agnostic Neural Networks, we showed how it is possible to find interesting neural network architectures without any training steps to update the weights of the evaluated models. This can make architecture search much more computationally efficient.

A weight-agnostic neural network performing a Cartpole Swing-up task at various different weight parameters, and also using fine-tuned weight parameters.

Applying AutoML to Transformer Architectures explored finding architectures for natural language processing tasks that significantly outperform vanilla Transformer models at substantially reduced computational costs.

Comparison between the Evolved Transformer and the original Transformer on WMT’14 En-De at varying sizes. The biggest gains in performance occur at smaller sizes, while ET also shows strength at larger sizes, outperforming the largest Transformer with 37.6% less parameters (models to compare are circled in green). See Table 3 in our paper for the exact numbers.

In SpecAugment: A New Data Augmentation Method for Automatic Speech Recognition, we showed that the approach of automatically learning data augmentation methods can be extended to speech recognition models, with the learned augmentation approaches achieving significantly higher accuracy with less data than existing human ML-expert driven data augmentation approaches.
We launched our first speech application for keyword spotting and spoken language identification using AutoML. In our experiments we found better models (both more efficient and better performance) than the human designed models that have been in this setting for some time.

Natural Language Understanding
The past few years have seen remarkable advances in models for natural language understanding, translation, natural dialog, speech recognition and related tasks. This year, one theme in our work was advancing the state of the art by combining modalities or tasks, to train more powerful and capable models. A few examples:

In Exploring Massively Multilingual, Massive Neural Machine Translation, we showed significant gains in translation quality by training a single model to translate between 100 languages, rather than having 100 separate models.

Left: Language pairs with larger amounts of training data generally have higher translation quality. Right: Multilingual training, where we train a single model for all language pairs rather than separate models for each language pair, results in substantial improvements in BLEU score (a measure of translation quality) for language pairs without much data.

In Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model, we showed how combining speech recognition and language models together and training the system on many languages, can significantly improve speech recognition accuracy.

Left: A traditional monolingual speech recognizer comprised of Acoustic, Pronunciation and Language Models for each language. Middle: A traditional multilingual speech recognizer where the Acoustic and Pronunciation model is multilingual, while the Language model is language-specific. Right: An E2E multilingual speech recognizer where the Acoustic, Pronunciation and Language Model is combined into a single multilingual model.

In Translatotron: An End-to-End Speech-to-Speech Translation Model, we showed that it is possible to train a joint model to accomplish the (normally separate) tasks of speech recognition, translation and text-to-speech generation with nice benefits, like preserving the sound of the speaker’s voice in the generated translated audio, as well as a simpler overall learning system.
In Multilingual Universal Sentence Encoder for Semantic Retrieval, we showed how to combine many different objectives to yield models that are significantly better at semantic retrieval (versus simpler word matching techniques). For example, in Google Talk to Books, the query “What fragrance brings back memories?” yields the result, “And for me, the smell of jasmine along with the pan bagnat, it brings back my entire carefree childhood.”

In Robust Neural Machine Translation, we showed how to use an adversarial training procedure to significantly improve the quality and robustness of language translations.

Left: The Transformer model is applied to an input sentence (lower left) and, in conjunction with the target output sentence (above right) and target input sentence (middle right; beginning with the placeholder “<sos>”), the translation loss is calculated. The AdvGen function then takes the source sentence, word selection distribution, word candidates and the translation loss as inputs to construct an adversarial source example. Right: In the defense stage, the adversarial source example serves as input to the Transformer model and the translation loss is calculated. AdvGen then uses the same method as above to generate an adversarial target example from the target input.

As our language understanding capabilities have improved, based on fundamental research advances like seq2seq, Transformer, BERT, Transformer-XL and ALBERT models, we have seen increased use of these sorts of models in many of our core products and features like Google Translate, Gmail’s Smart Compose, and Google Search. This year, the launch of BERT in our core search and ranking algorithms led to the biggest improvement in search quality in the last five years (and one of the biggest ever), through better understanding of the subtle meanings of query and document words and phrases.

Machine Perception
Models for better understanding of still images have made remarkable progress in the last decade. Among the next major frontiers are models and approaches for understanding the dynamic world in fine-grained detail. This includes deeper and more nuanced understanding of images and video, as well as live and situated perception: understanding the audiovisual world at interactive rates and with a shared spatial grounding with the user. This year, we explored many aspects of advances in this area, including:

Finer-grained visual understanding in Lens, enabling even more powerful visual search.
Helpful smart camera features such as Quick Gestures, Face Match and smart video call framing on the Nest Hub Max.
Technology for live and spatially-aware perception for helpfully augmenting the world around us through Lens.
Better models for depth prediction from videos.

Better representations for fine-grained temporal understanding of videos using temporal cycle-consistency learning.

Right: Input videos of people performing a squat exercise. The video on the top left is the reference. The other videos show nearest neighbor frames (in the TCC embedding space) from other videos of people doing squats. Left: The corresponding frame embeddings move as the action is performed.

Learning representations across text, speech and video that are temporally consistent from unlabeled videos.

Qualitative results from VideoBERT, pretrained on cooking videos. Top: Given some recipe text, we generate a sequence of visual tokens. Bottom: Given a visual token, we show the top three future tokens forecast by VideoBERT at different time scales. In this case, the model predicts that a bowl of flour and cocoa powder may be baked in an oven and may become a brownie or cupcake. We visualize the visual tokens using the images from the training set closest to the tokens in feature space.

Being able to predict future visual inputs from observations of the past.
Models that can better understand action sequences in videos, enabling you to better recall special video moments like “blowing out candles” or “sliding down a slide” in Google Photos.
Architecture for temporal action localization.

We’re quite excited about the prospects of continued improvements in the understanding of the sensory world around us.

Robotics
The application of machine learning to robotic control is a significant research area for us. We believe this is a vital tool for enabling robots to operate effectively in complex, real-world environments like everyday homes and businesses. Some of the work we did this year includes:

In Long-Range Robotic Navigation via Automated Reinforcement Learning, we showed how to combine reinforcement learning with long-range planning to enable robots to more effectively navigate complex environments (like our Google office buildings).
In PlaNet: A Deep Planning Network for Reinforcement Learning, we showed how to effectively learn a world model purely from the pixels of images, and how to leverage this model of how the world behaves in order to accomplish tasks with many fewer learning episodes.
In Unifying Physics and Deep Learning with TossingBot, we showed how robots can learn “intuitive” physics from experimentation in an environment, rather than being pre-programmed with physics models about the environment in which they are operating.
In Soft Actor-Critic: Deep Reinforcement Learning for Robotics, we showed that training a reinforcement learning algorithm to both maximize the expected reward (which is the standard RL objective) and to maximize the policy's entropy (so that learning favors policies that are more random), can help robots learn faster and be more robust to changes in their environment.
In Learning to Assemble and to Generalize from Self-Supervised Disassembly, we showed how robots can learn to assemble by first learning to disassemble things in a self-supervised way. Kids learn from taking things apart, and it appears that robots can as well!
We introduced ROBEL: Robotics Benchmarks for Learning with Low-Cost Robots, an open-source platform of cost-effective robots and curated benchmarks designed to facilitate research and development on physical robotics hardware in the real world.

Helping Advance the Broader Developer and Researcher Community
Open source is about more than code: it's about the community of contributors. It’s been an exciting year to be part of the open source community. We launched TensorFlow 2.0—the biggest TensorFlow release to date—which makes building ML systems and applications easier than ever. We added support for fast mobile GPU inference to TensorFlow Lite. We also launched Teachable Machine 2.0, a fast, easy web-based tool which can train a machine learning model with the click of a button, no coding required. We announced MLIR, open source machine learning compiler infrastructure that addresses the complexity of growing software and hardware fragmentation and makes it easier to build AI applications.

We saw the first year of JAX, a new system for high-performance machine learning research. At NeurIPS 2019, Googlers and the broader open-source community presented work using JAX ranging from neural tangent kernels to Bayesian inference to molecular dynamics, and we launched a preview of JAX on Cloud TPUs.

We open-sourced MediaPipe, a framework for building perceptual and multimodal applied ML pipelines, and XNNPACK, a library of efficient floating-point neural network inference operators. As of the end of 2019, we had enabled more than 1,500 researchers around the world to access Cloud TPUs for free via the TensorFlow Research Cloud. Our Intro To TensorFlow at Coursera crossed 100,000 students. And we engaged with thousands of users while taking TensorFlow on the road to 11 different countries, hosted our first ever TensorFlow World and more.

With the help of TensorFlow, one college student discovered two new planets and built a method to help others find more. A data scientist originally from Nigeria trained a GAN to generate images reminiscent of African masks. A developer in Uganda used TensorFlow to create the Farmers Companion, an app that local farmers can use to fight a crop-destroying caterpillar. In snowy Iowa, researchers and state officials used TensorFlow to determine safe road conditions based on traffic behavior, visuals and other data. In sunny California, college students used TensorFlow to identify pot holes and dangerous road cracks in Los Angeles. And in France, a coder used TensorFlow to build a simple algorithm that learns how to add color to black-and-white photos.

Open Datasets
Open datasets with clear and measurable goals are often very helpful in driving forward the field of machine learning. To help the research community find interesting datasets, we continue to index a wide variety of open datasets sourced from many different organizations with Google Dataset Search. We also think it's important to create new datasets for the community to explore and to develop new techniques, and to ensure we share open data responsibly. This year, we additionally released a number of open datasets across many different areas:

Open Images V5: An update to the popular Open Images dataset that includes segmentation masks for 2.8 million objects in 350 categories (so that it now has ~9M images annotated with image-level labels, object bounding boxes, object segmentation masks, and visual relationships).
Natural questions: the first dataset to use naturally occurring queries and find answers by reading an entire page, rather than extracting answers from a short paragraph.
Data for deepfake detection: we contributed a large dataset of visual deepfakes to the FaceForensics benchmark (mentioned above).
Google Research Football: a novel reinforcement learning environment where agents aim to master the world’s most popular sport—football (or, if you’re American, soccer). It’s important for reinforcement learning agents to have GOOOAAALLLSS!
Google-Landmarks-v2: over 5 million images (2x that of the first release) of more than 200 thousand different landmarks.
YouTube-8M Segments: A large-scale classification and temporal localization dataset that includes human-verified labels at the 5-second segment level of YouTube-8M videos.
Atomic Visual Actions (AVA) Spoken Activity: A multimodal audio+visual video dataset for perception of conversations. In addition, academic challenges were run for AVA action recognition and AVA: Spoken Activity
PAWS and PAWS-X: To help with paraphrase identification, both datasets contain well-formed sentence pairs with high lexical overlap, in which around half of pairs are paraphrase and half are not.
Natural language dialog datasets: CCPE and Taskmaster-1 both use a Wizard-of-Oz platform that pairs two people who engage in spoken conversations, to mimic a human-level conversation with a digital assistant.
The Visual Task Adaptation Benchmark: VTAB follows similar guidelines to ImageNet and GLUE but is based on one principle—a better representation is one that yields better performance on unseen tasks, with limited in-domain data.
Schema-Guided Dialogue Dataset: the largest publicly available corpus of task-oriented dialogues, with over 18,000 dialogues spanning 17 domains.

Research Community Interaction
Finally, we’ve been busy within the broader academic and research community. In 2019 Google researchers presented hundreds of papers, participated in numerous conferences and received many awards and other accolades. We had a strong presence at:

CVPR: ~250 Googlers presented 40+ papers, talks, posters, workshops and more.
ICML: ~200 Googlers presented 100+ papers, talks, posters, workshops and more.
ICLR: ~200 Googlers presented 60+ papers, talks, posters, workshops and more.
ACL: ~100 Googlers presented 40+ papers, workshops and tutorials.
Interspeech: Over 100 Googlers presented 30+ papers.
ICCV: ~200 Googlers presented 40+ papers, and several Googlers also won three prestigious ICCV awards.
NeurIPS: ~500 Googlers co-authored more than 120 accepted papers and engaged in various workshops and more.

We also brought together hundreds of Google researchers and faculty from across the globe to 15 separate research workshops hosted at Google locations. These workshops were on topics ranging from improving flood forecasting globally, to how to use machine learning to build systems that can better serve people with disabilities, to accelerating the development of algorithms, applications and tools for noisy-intermediate scale quantum (NISQ) processors.

Supporting academia and research communities outside of Google, we supported over 50 PhD students globally through our annual PhD Fellowship Program, we funded 158 projects as part of our Google Faculty Research Awards 2018, and we held our third cohort of the Google AI Residency Program. We also mentored AI-focused startups.

New Places, New Faces
We’ve made lots of headway in 2019, but there’s so much more we can do. To continue growing our impact around the world, we opened a Research office in Bangalore, and we’re expanding in other offices. If you’re excited about working on these sorts of problems, we’re hiring!

Looking Forward to 2020 and Beyond
The past decade has seen remarkable advances in the fields of machine learning and computer science, where we now have given computers the ability to see, hear and understand language better than ever before (see a nice overview of important advances of the last decade). In our pockets, we now have sophisticated computing devices that can use these capabilities to better help us accomplish a multitude of tasks in our daily lives. We have substantially redesigned our computing platforms around these machine learning approaches by developing specialized hardware, giving us the ability to tackle ever larger problems. This has changed how we think about computing devices both in data centers (such as the inference-focused TPUv1 and the training-and-inference focused TPUv2 and TPUv3), as well as in low-power mobile environments (such as Edge TPUs). The deep learning revolution will continue to reshape how we think about computing and computers.

At the same time, there are a huge number of unanswered questions and unsolved problems. Some directions and questions that we are excited about tackling in 2020 and beyond are:

How can we build machine learning systems that can handle millions of tasks, and that can learn to successfully accomplish new tasks automatically? Currently, we’re mostly training separate machine models for each new task, starting from scratch, or at best, from a model trained on one or a few highly related tasks. As such, the models we train are really good at one or a few things, but not good at anything else. However, what we truly want are models that are good at leveraging their expertise at doing many things, so that they are able to learn to do a new thing with relatively little training data and computation. This is a true grand challenge which will require expertise and advances in many areas spanning solid-state circuit design, computer architecture, ML-focused compilers, distributed systems, machine learning algorithms and domain experts across many other fields in order to build systems that can generalize to solve new tasks independently across a full range of application areas.
How can we advance the state-of-the-art in important areas of artificial intelligence research like avoiding bias, increasing interpretability & understandability, improving privacy and ensuring safety? Advances in these areas are going to be critical as we use machine learning in more and more ways in society.
How can we apply computation and machine learning to make advances in important new areas of science? There are important advances to be had by collaborating with experts in other fields in areas like climate science, healthcare, bioinformatics and many other areas.
How can we ensure that the ideas and directions pursued by the machine learning and computer science research communities are put forth and explored by a diverse group of researchers? The work that the computer science and machine learning research communities are pursuing has broad implications for billions of people, and we want the set of researchers doing this work to represent the experiences, perspectives, concerns and creative enthusiasm of all the people of the world. How can we best support new researchers from diverse backgrounds entering the field?

Overall, 2019 was a very exciting year for research at Google and in the broader research community. We’re excited about tackling the research challenges ahead of us in 2020 and beyond, and we look forward to sharing our progress with you!

Source: Google AI Blog

Google Research: Looking Back at 2019, and Forward to 2020 and Beyond

Published a research paper about a new transparency tool, which enabled the launch of Model Cards for several of our Cloud AI products. You can see an example model card for the Cloud AI Vision API Object Detection feature.
Showed how Activation Atlases can help explore neural network behavior and can aid with interpretability of machine learning models.
Introduced TensorFlow Privacy, an open-source library to enable training machine learning models with differential privacy guarantees.
Released a beta version of Fairness Indicators, to help ML practitioners identify unjust or unintended impacts of machine learning models.
Clicking on a slice in Fairness Indicators will load all the data points in that slice inside the What-If Tool widget. In this case, all data points with the “female” label are shown.
Published a KDD'19 paper on how pairwise comparisons and regularization is incorporated into a large-scale production recommender system to improve ML Fairness.
Published an AIES'19 paper about a case study on the application of fairness in machine learning research to a production classification system, and described our fairness metric, conditional equality, that takes into account distributional differences in implementing equality of opportunity.
Published an AIES'19 paper about counterfactual fairness in text classification problems that asks the question: "How would the prediction change if the sensitive attribute referenced in the example were different?" and used this approach to improve our production systems that assess the toxicity of online content.

Released a new dataset to help with research to identify deepfakes.

The Fondation Médecins Sans Frontières (MSF) is creating a free smartphone application that uses image recognition tools to help clinical staff in low-resource settings (currently being piloted in Jordan) to analyze anti-microbial images and advise on the appropriate antibiotics to use for a particular patient’s infection.
Over a billion people live in smallholder farm households. A single pest attack can devastate their crop yields and livelihoods. Wadhwani AI uses image classification models that can identify pests and provide timely advice on what pesticides to spray and when—ultimately improving crop yield.
And deep in tropical rainforests, where illegal deforestation is a major driver of climate change, Rainforest Connection uses deep learning for bioacoustic monitoring and old cell phones to track rainforest health and detect threats.
Our 20 AI Impact Challenge winners. You can learn more about the work of all the grantees here.

In An Interactive, Automated 3D Reconstruction of a Fly Brain, we reported on a collaborative effort that achieved a milestone of mapping the structure of an entire fly brain, using machine learning models that were able to painstakingly trace each individual neuron.

We gave machine learning models better scents of the world with Learning to Smell: Using Deep Learning to Predict the Olfactory Properties of Molecules. We showed how to leverage graph neural networks (GNNs) to directly predict the odor descriptors for individual molecules, without using any handcrafted rules.

In work that combines chemistry and reinforcement learning techniques, we presented a framework for molecule optimization.
Machine learning can also help us in our artistic and creative endeavors. Artists have found ways to collaborate with AI and AR and create interesting new forms, from dancing with a machine to reimagine choreography, to creating new melodies with machine learning tools. ML can be used by novices, too. To honor the birthday of J.S. Bach, we featured a ML-powered Doodle: just create your melody, and the ML tool can create accompanying harmonizations in Bach’s style.

Lookout helps people who are blind or have low vision identify information about their surroundings. It draws upon similar underlying technology as Google Lens, which lets you search and take action on the objects around you, simply by pointing your phone.
Live Transcribe has the potential to give people who are deaf or hard of hearing greater independence in their everyday interactions. You can get real-time transcriptions of conversations that the user is engaged in, even if the speech is in another language.
Project Euphonia performs personalized speech-to-text transcription. For people with ALS and other conditions that produce slurred or non-standard speech, this research improves automatic speech recognition (ASR) over other state-of-the-art ASR models.
Like Project Euphonia, Parrotron uses end-to-end neural networks to help improve communication, but the research focuses on automatic speech-to-speech conversion rather than transcription, presenting a speech interface that may be easier for some to access.
Millions of images online don’t have any text description. Get Image Descriptions from Google helps blind or low vision users understand unlabelled images. When a screen reader encounters an image or graphic without a description, Chrome can now create one automatically.
We developed tools that can read visual text in audio form in Lens for Google Go, greatly helping users who are not fully literate navigate the word-rich world around them.

The launch of on-device captioning with Live Caption, giving always-available transcription of any video playing on your device.
The creation of a powerful new transcribing Recorder app, which can help index audio information and make it easily retrievable.
Improvements to Google Translate’s camera translation, so that you can point at text in an unfamiliar language and get it instantly translated in context.
Release of the Augmented Faces API in ARCore, enabling new real-time AR self-expression tools.
A demonstration of on-device, real-time hand tracking, enabling new ways for users to interact with and control their devices with their hands.
Improved, RNN-based on-device handwriting recognition for on-screen mobile keyboards.
The release of a new global localization approach using your smart phone’s camera to help more accurately orient you and help you find your way in the world.

We showed that a deep learning model for mammography can assist physicians in spotting breast cancer, a condition that affects 1 in 8 women in the US during their lifetimes, with greater accuracy than experts, reducing both false positives and false negatives. The model trained on de-identified data from a UK hospital had similar gains in accuracy when used to evaluate patients in a completely different healthcare system in the U.S.
Example of a difficult-to-detect cancer case correctly identified by machine learning.
We showed that a deep learning model for differential diagnoses of skin diseases can give results that are significantly more accurate than primary care physicians and on par with or perhaps slightly better than dermatologists.
Working alongside experts from the US Department of Veterans Affairs (VA), DeepMind Health colleagues who are now part of Google Health showed that a machine learning model can predict the onset of acute kidney injury (AKI), one of the leading causes of avoidable patient harm, up to two days before it happens. In the future, this could give doctors a 48-hour head start in treating this serious condition.
We expanded the application of deep learning to electronic health records with several partner organizations. You can read more about this work in our 2018 blog post.
We showed a promising step forward for predicting lung cancer, where a deep learning model for examining the results of a single CT scan study performed on par or better than trained radiologists at early detection of lung cancer. Early detection of lung cancer dramatically improves survival rates.
We continued to expand and evaluate our deployment of machine learning tools for detection and prevention of eye disease, in collaboration with Verily and our healthcare partners in India and Thailand.
We published a research paper on an augmented reality microscope for cancer diagnosis, whereby a pathologist can get real-time feedback about what parts of a slide are most interesting while examining tissue through a microscope. You can also read more about it in our 2018 blog post here.
We built a human-centric, similar-image search tool for pathologists to help them make more effective diagnoses, by allowing examination of similar cases.

In Evaluating the Unsupervised Learning of Disentangled Representations, we examined what properties affect the representations that are learned from unsupervised data, in order to better understand what makes for good representations and effective learning.
In Predicting the Generalization Gap in Deep Neural Networks, we showed that it is possible to predict the generalization gap (the gap between a model’s performance on data from the training distribution versus data drawn from a different distribution) using statistics of the margin distribution, helping us better understand which models generalize most effectively. We also did some research on Improving Out-of-Distribution Detection in Machine Learning Models, to better understand when a model is starting to encounter kinds of data it has never seen before. We also looked at Off-Policy Classification in the context of reinforcement learning, to better understand which models are likely to generalize the best.

In EfficientNet-EdgeTPU: Creating Accelerator-Optimized Neural Networks with AutoML, we showed how a neural architecture search approach can find efficient models that are tailored to particular hardware accelerators, resulting in high accuracy, low-computational models for running on mobile devices.

We developed AutoML techniques for tabular data, unlocking an important domain where many companies and organizations have interesting data in relational databases, and often want to develop machine learning models on this data. We collaborated to release this technology as a new Google Cloud AutoML Tables product, and also discussed how well this system did in a new Kaggle competition in An End-to-End AutoML Solution for Tabular Data at KaggleDays (spoiler: AutoML Tables finished second out of 74 teams of expert data scientists).
In Exploring Weight Agnostic Neural Networks, we showed how it is possible to find interesting neural network architectures without any training steps to update the weights of the evaluated models. This can make architecture search much more computationally efficient.

A weight-agnostic neural network performing a Cartpole Swing-up task at various different weight parameters, and also using fine-tuned weight parameters.

In SpecAugment: A New Data Augmentation Method for Automatic Speech Recognition, we showed that the approach of automatically learning data augmentation methods can be extended to speech recognition models, with the learned augmentation approaches achieving significantly higher accuracy with less data than existing human ML-expert driven data augmentation approaches.
We launched our first speech application for keyword spotting and spoken language identification using AutoML. In our experiments we found better models (both more efficient and better performance) than the human designed models that have been in this setting for some time.

In Translatotron: An End-to-End Speech-to-Speech Translation Model, we showed that it is possible to train a joint model to accomplish the (normally separate) tasks of speech recognition, translation and text-to-speech generation with nice benefits, like preserving the sound of the speaker’s voice in the generated translated audio, as well as a simpler overall learning system.
In Multilingual Universal Sentence Encoder for Semantic Retrieval, we showed how to combine many different objectives to yield models that are significantly better at semantic retrieval (versus simpler word matching techniques). For example, in Google Talk to Books, the query “What fragrance brings back memories?” yields the result, “And for me, the smell of jasmine along with the pan bagnat, it brings back my entire carefree childhood.”

In Robust Neural Machine Translation, we showed how to use an adversarial training procedure to significantly improve the quality and robustness of language translations.

Finer-grained visual understanding in Lens, enabling even more powerful visual search.
Helpful smart camera features such as Quick Gestures, Face Match and smart video call framing on the Nest Hub Max.
Technology for live and spatially-aware perception for helpfully augmenting the world around us through Lens.
Better models for depth prediction from videos.

Better representations for fine-grained temporal understanding of videos using temporal cycle-consistency learning.

Learning representations across text, speech and video that are temporally consistent from unlabeled videos.

Being able to predict future visual inputs from observations of the past.
Models that can better understand action sequences in videos, enabling you to better recall special video moments like “blowing out candles” or “sliding down a slide” in Google Photos.
Architecture for temporal action localization.

In Long-Range Robotic Navigation via Automated Reinforcement Learning, we showed how to combine reinforcement learning with long-range planning to enable robots to more effectively navigate complex environments (like our Google office buildings).
In PlaNet: A Deep Planning Network for Reinforcement Learning, we showed how to effectively learn a world model purely from the pixels of images, and how to leverage this model of how the world behaves in order to accomplish tasks with many fewer learning episodes.
In Unifying Physics and Deep Learning with TossingBot, we showed how robots can learn “intuitive” physics from experimentation in an environment, rather than being pre-programmed with physics models about the environment in which they are operating.
In Soft Actor-Critic: Deep Reinforcement Learning for Robotics, we showed that training a reinforcement learning algorithm to both maximize the expected reward (which is the standard RL objective) and to maximize the policy's entropy (so that learning favors policies that are more random), can help robots learn faster and be more robust to changes in their environment.
In Learning to Assemble and to Generalize from Self-Supervised Disassembly, we showed how robots can learn to assemble by first learning to disassemble things in a self-supervised way. Kids learn from taking things apart, and it appears that robots can as well!
We introduced ROBEL: Robotics Benchmarks for Learning with Low-Cost Robots, an open-source platform of cost-effective robots and curated benchmarks designed to facilitate research and development on physical robotics hardware in the real world.

Open Images V5: An update to the popular Open Images dataset that includes segmentation masks for 2.8 million objects in 350 categories (so that it now has ~9M images annotated with image-level labels, object bounding boxes, object segmentation masks, and visual relationships).
Natural questions: the first dataset to use naturally occurring queries and find answers by reading an entire page, rather than extracting answers from a short paragraph.
Data for deepfake detection: we contributed a large dataset of visual deepfakes to the FaceForensics benchmark (mentioned above).
Google Research Football: a novel reinforcement learning environment where agents aim to master the world’s most popular sport—football (or, if you’re American, soccer). It’s important for reinforcement learning agents to have GOOOAAALLLSS!
Google-Landmarks-v2: over 5 million images (2x that of the first release) of more than 200 thousand different landmarks.
YouTube-8M Segments: A large-scale classification and temporal localization dataset that includes human-verified labels at the 5-second segment level of YouTube-8M videos.
Atomic Visual Actions (AVA) Spoken Activity: A multimodal audio+visual video dataset for perception of conversations. In addition, academic challenges were run for AVA action recognition and AVA: Spoken Activity
PAWS and PAWS-X: To help with paraphrase identification, both datasets contain well-formed sentence pairs with high lexical overlap, in which around half of pairs are paraphrase and half are not.
Natural language dialog datasets: CCPE and Taskmaster-1 both use a Wizard-of-Oz platform that pairs two people who engage in spoken conversations, to mimic a human-level conversation with a digital assistant.
The Visual Task Adaptation Benchmark: VTAB follows similar guidelines to ImageNet and GLUE but is based on one principle—a better representation is one that yields better performance on unseen tasks, with limited in-domain data.
Schema-Guided Dialogue Dataset: the largest publicly available corpus of task-oriented dialogues, with over 18,000 dialogues spanning 17 domains.

CVPR: ~250 Googlers presented 40+ papers, talks, posters, workshops and more.
ICML: ~200 Googlers presented 100+ papers, talks, posters, workshops and more.
ICLR: ~200 Googlers presented 60+ papers, talks, posters, workshops and more.
ACL: ~100 Googlers presented 40+ papers, workshops and tutorials.
Interspeech: Over 100 Googlers presented 30+ papers.
ICCV: ~200 Googlers presented 40+ papers, and several Googlers also won three prestigious ICCV awards.
NeurIPS: ~500 Googlers co-authored more than 120 accepted papers and engaged in various workshops and more.

How can we build machine learning systems that can handle millions of tasks, and that can learn to successfully accomplish new tasks automatically? Currently, we’re mostly training separate machine models for each new task, starting from scratch, or at best, from a model trained on one or a few highly related tasks. As such, the models we train are really good at one or a few things, but not good at anything else. However, what we truly want are models that are good at leveraging their expertise at doing many things, so that they are able to learn to do a new thing with relatively little training data and computation. This is a true grand challenge which will require expertise and advances in many areas spanning solid-state circuit design, computer architecture, ML-focused compilers, distributed systems, machine learning algorithms and domain experts across many other fields in order to build systems that can generalize to solve new tasks independently across a full range of application areas.
How can we advance the state-of-the-art in important areas of artificial intelligence research like avoiding bias, increasing interpretability & understandability, improving privacy and ensuring safety? Advances in these areas are going to be critical as we use machine learning in more and more ways in society.
How can we apply computation and machine learning to make advances in important new areas of science? There are important advances to be had by collaborating with experts in other fields in areas like climate science, healthcare, bioinformatics and many other areas.
How can we ensure that the ideas and directions pursued by the machine learning and computer science research communities are put forth and explored by a diverse group of researchers? The work that the computer science and machine learning research communities are pursuing has broad implications for billions of people, and we want the set of researchers doing this work to represent the experiences, perspectives, concerns and creative enthusiasm of all the people of the world. How can we best support new researchers from diverse backgrounds entering the field?

Source: Google AI Blog

Google at ICCV 2019

Andrew Helton, Editor, Google Research Communications

This week, Seoul, South Korea hosts the International Conference on Computer Vision 2019 (ICCV 2019), one of the world's premier conferences on computer vision. As a leader in computer vision research and a Gold Sponsor, Google will have a strong presence at ICCV 2019 with over 200 Googlers in attendance, more than 40 research presentations, and involvement in the organization of a number of workshops and tutorials.

If you are attending ICCV this year, please stop by our booth. There you can chat with researchers who are actively pursuing the latest innovations in computer vision and demo some of their latest research, including the technology behind MediaPipe, the new Open Images dataset, new developments for Google Lens and much more.

This year Google researchers are recipients of three prestigious ICCV awards:

Distinguished Researcher Award — Bill Freeman, Research Scientist, Google Research
Helmholtz Prize (Test of Time Award) — ICCV 2009 paper, "Building Rome in a Day", by Sameer Agarwal, Noah Snavely, Ian Simon, Steve Seitz and Rick Szeliski
Marr Prize (Best Paper Award) — ICCV 2019 paper, "SinGAN: Learning a Generative Model from a Single Natural Image", by Tamar Rott Shaham, Tali Dekel and Tomer Michaeli

More details about the Google research being presented at ICCV 2019 can be found below (Google affiliations in blue).

Organizing Committee includes:
Ming-Hsuan Yang (Program Chair)

Oral Presentations
Learning Single Camera Depth Estimation using Dual-Pixels
Rahul Garg, Neal Wadhwa, Sameer Ansari, Jonathan Barron

RIO: 3D Object Instance Re-Localization in Changing Indoor Environments
Johanna Wald, Armen Avetisyan, Nassir Navab, Federico Tombari, Matthias Niessner

ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors
Weicheng Kuo, Anelia Angelova, Jitendra Malik, Tsung-Yi Lin

PuppetGAN: Cross-Domain Image Manipulation by Demonstration
Ben Usman, Nick Dufour, Kate Saenko, Chris Bregler

COCO-GAN: Generation by Parts via Conditional Coordinating
Chieh Hubert Lin, Chia-Che Chang, Yu-Sheng Chen, Da-Cheng Juan, Wei Wei, Hwann-Tzong Chen

Towards Unconstrained End-to-End Text Spotting
Siyang Qin, Alessandro Bissaco, Michalis Raptis, Yasuhisa Fujii, Ying Xiao

SinGAN: Learning a Generative Model from a Single Natural Image
Tamar Rott Shaham, Tali Dekel, Tomer Michaeli
(ICCV 2019 Marr Prize Winner — Best Paper Award)

Generative Modeling for Small-Data Object Detection
Lanlan Liu, Michael Muelly, Jia Deng, Tomas Pfister, Li-Jia Li

Searching for MobileNetV3
Andrew Howard, Mark Sandler, Bo Chen, Weijun Wang, Liang-Chieh Chen, Mingxing Tan, Grace Chu, Vijay Vasudevan, Yukun Zhu, Ruoming Pang, Hartwig Adam, Quoc Le

S⁴L: Self-Supervised Semi-supervised Learning
Lucas Beyer, Xiaohua Zhai, Avital Oliver, Alexander Kolesnikov

Sampling-Free Epistemic Uncertainty Estimation Using Approximated Variance Propagation
Janis Postels, Francesco Ferroni, Huseyin Coskun, Nassir Navab, Federico Tombari

Linearized Multi-sampling for Differentiable Image Transformation
Wei Jiang, Weiwei Sun, Andrea Tagliasacchi, Eduard Trulls, Kwang Moo Yi

Poster Presentations
ELF: Embedded Localisation of Features in Pre-trained CNN
Assia Benbihi, Matthieu Geist, Cedric Pradalier

Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unknown Cameras
Ariel Gordon, Hanhan Li, Rico Jonschkowski, Anelia Angelova

ForkNet: Multi-branch Volumetric Semantic Completion from a Single Depth Image
Yida Wang, David Joseph Tan, Nassir Navab, Federico Tombari

A Learned Representation for Scalable Vector Graphics
Raphael Gontijo Lopes, David Ha, Douglas Eck, Jonathon Shlens

FrameNet: Learning Local Canonical Frames of 3D Surfaces from a Single RGB Image
Jingwei Huang, Yichao Zhou, Thomas Funkhouser, Leonidas Guibas

Prior-Aware Neural Network for Partially-Supervised Multi-Organ Segmentation
Yuyin Zhou, Zhe Li, Song Bai, Xinlei Chen, Mei Han, Chong Wang, Elliot Fishman, Alan Yuille

Boundless: Generative Adversarial Networks for Image Extension
Dilip Krishnan, Piotr Teterwak, Aaron Sarna, Aaron Maschinot, Ce Liu, David Belanger, William Freeman

Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection
Keren Ye, Mingda Zhang, Adriana Kovashka, Wei Li, Danfeng Qin, Jesse Berent

NOTE-RCNN: NOise Tolerant Ensemble RCNN for Semi-supervised Object Detection
Jiyang Gao, Jiang Wang, Shengyang Dai, Li-Jia Li, Ram Nevatia

Object-Driven Multi-Layer Scene Decomposition from a Single Image
Helisa Dhamo, Nassir Navab, Federico Tombari

Improving Adversarial Robustness via Guided Complement Entropy
Hao-Yun Chen, Jhao-Hong Liang, Shih-Chieh Chang, Jia-Yu Pan, Yu-Ting Chen, Wei Wei, Da-Cheng Juan

XRAI: Better Attributions Through Regions
Andrei Kapishnikov, Tolga Bolukbasi, Fernanda Viegas, Michael Terry

SegSort: Segment Sorting for Semantic Segmentation
Jyh-Jing Hwang, Stella Yu, Jianbo Shi, Maxwell Collins, Tien-Ju Yang, Xiao Zhang, Liang-Chieh Chen

Self-Supervised Learning with Geometric Constraints in Monocular Video: Connecting Flow, Depth, and Camera
Yuhua Chen, Cordelia Schmid, Cristian Sminchisescu

VideoBERT: A Joint Model for Video and Language Representation Learning
Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, Cordelia Schmid

Explaining the Ambiguity of Object Detection and 6D Pose from Visual Data
Fabian Manhardt, Diego Martín Arroyo, Christian Rupprecht, Benjamin Busam, Tolga Birdal, Nassir Navab, Federico Tombari

Constructing Self-Motivated Pyramid Curriculums for Cross-Domain Semantic Segmentation
Qing Lian, Lixin Duan, Fengmao Lv, Boqing Gong

Learning Shape Templates Using Structured Implicit Functions
Kyle Genova, Forrester Cole, Daniel Vlasic, Aaron Sarna, William Freeman, Thomas Funkhouser

Transferable Representation Learning in Vision-and-Language Navigation
Haoshuo Huang, Vihan Jain, Harsh Mehta, Alexander Ku, Gabriel Magalhaes, Jason Baldridge, Eugene Ie

Controllable Attention for Structured Layered Video Decomposition
Jean-Baptiste Alayrac, Joao Carreira, Relja Arandjelović, Andrew Zisserman

Pixel2Mesh++: Multi-view 3D Mesh Generation via Deformation
Chao Wen, Yinda Zhang, Zhuwen Li, Yanwei Fu

Beyond Cartesian Representations for Local Descriptors
Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, Eduard Trulls

Domain Randomization and Pyramid Consistency: Simulation-to-Real Generalization without Accessing Target Domain Data
Xiangyu Yue, Yang Zhang, Sicheng Zhao, Alberto Sangiovanni-Vincentelli, Kurt Keutzer, Boqing Gong

Evolving Space-Time Neural Architectures for Videos
AJ Piergiovanni, Anelia Angelova, Alexander Toshev, Michael Ryoo

Moulding Humans: Non-parametric 3D Human Shape Estimation from Single Images
Valentin Gabeur, Jean-Sebastien Franco, Xavier Martin, Cordelia Schmid, Gregory Rogez

Multi-view Image Fusion
Marc Comino Trinidad, Ricardo Martin-Brualla, Florian Kainz, Janne Kontkanen

EvalNorm: Estimating Batch Normalization Statistics for Evaluation
Saurabh Singh, Abhinav Shrivastava

Attention Augmented Convolutional Networks
Irwan Bello, Barret Zoph, Quoc Le, Ashish Vaswani, Jonathon Shlens

Patchwork: A Patch-wise Attention Network for Efficient Object Detection and Segmentation in Video Streams
Yuning Chai

Workshops
Low-Power Computer Vision
Organizers include: Bo Chen

Neural Architects
Organizers include: Barret Zoph

The 3rd YouTube-8M Large-Scale Video Understanding Workshop
Organizers include: Paul Natsev, Cordelia Schmid, Rahul Sukthankar, Joonseok Lee, George Toderici

Should We Pre-register Experiments in Computer Vision?
Organizers include: Jack Valmadre

Extreme Vision Modeling
Organizers include: Rahul Sukthankar

Joint COCO and Mapillary Recognition Challenge
Organizers include: Tsung-Yi Lin, Yin Cui

Open Images Challenge
Organizers include: Vittorio Ferrari, Alina Kuznetsova, Rodrigo Benenson, Victor Gomes, Matteo Malloci

Tutorials
Meta-Learning and Metric Learning Algorithms
Organizers include: Kevin Swersky

Source: Google AI Blog

Google at Interspeech 2019

Andrew Helton, Editor, Google Research Communications

This week, Graz, Austria hosts the 20th Annual Conference of the International Speech Communication Association (Interspeech 2019), one of the world‘s most extensive conferences on the research and engineering for spoken language processing. Over 2,000 experts in speech-related research fields gather to take part in oral presentations and poster sessions and to collaborate with streamed events across the globe.

As a Gold Sponsor of Interspeech 2019, we are excited to present 30 research publications, and demonstrate some of the impact speech technology has made in our products, from accessible, automatic video captioning to a more robust, reliable Google Assistant. If you’re attending Interspeech 2019, we hope that you’ll stop by the Google booth to meet our researchers and discuss projects and opportunities at Google that go into solving interesting problems for billions of people. Our researchers will also be on hand to discuss Google Cloud Text-to-Speech and Speech-to-text, demo Parrotron, and more. You can also learn more about the Google research being presented at Interspeech 2019 below (Google affiliations in blue).

Organizing Committee includes:
Michiel Bacchiani

Technical Program Committee includes:
Tara Sainath

Tutorials
Neural Machine Translation
Organizers include: Wolfgang Macherey, Yuan Cao

Accepted Publications
Building Large-Vocabulary ASR Systems for Languages Without Any Audio Training Data (link to appear soon)
Manasa Prasad, Daan van Esch, Sandy Ritchie, Jonas Fromseier Mortensen

Multi-Microphone Adaptive Noise Cancellation for Robust Hotword Detection (link to appear soon)
Yiteng Huang, Turaj Shabestary, Alexander Gruenstein, Li Wan

Direct Speech-to-Speech Translation with a Sequence-to-Sequence Model
Ye Jia, Ron Weiss, Fadi Biadsy, Wolfgang Macherey, Melvin Johnson, Zhifeng Chen, Yonghui Wu

Improving Keyword Spotting and Language Identification via Neural Architecture Search at Scale (link to appear soon)
Hanna Mazzawi, Javier Gonzalvo, Aleks Kracun, Prashant Sridhar, Niranjan Subrahmanya, Ignacio Lopez Moreno, Hyun Jin Park, Patrick Violette

Shallow-Fusion End-to-End Contextual Biasing (link to appear soon)
Ding Zhao, Tara Sainath, David Rybach, Pat Rondon, Deepti Bhatia, Bo Li, Ruoming Pang

VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking
Quan Wang, Hannah Muckenhirn, Kevin Wilson, Prashant Sridhar, Zelin Wu, John Hershey, Rif Saurous, Ron Weiss, Ye Jia, Ignacio Lopez Moreno

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
Daniel Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin Dogus Cubuk, Quoc Le

Two-Pass End-to-End Speech Recognition
Ruoming Pang, Tara Sainath, David Rybach, Yanzhang He, Rohit Prabhavalkar, Mirko Visontai, Qiao Liang, Trevor Strohman, Yonghui Wu, Ian McGraw, Chung-Cheng Chiu

On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition
Kazuki Irie, Rohit Prabhavalkar, Anjuli Kannan, Antoine Bruguier, David Rybach, Patrick Nguyen

Contextual Recovery of Out-of-Lattice Named Entities in Automatic Speech Recognition (link to appear soon)
Jack Serrino, Leonid Velikovich, Petar Aleksic, Cyril Allauzen

Joint Speech Recognition and Speaker Diarization via Sequence Transduction
Laurent El Shafey, Hagen Soltau, Izhak Shafran

Personalizing ASR for Dysarthric and Accented Speech with Limited Data
Joel Shor, Dotan Emanuel, Oran Lang, Omry Tuval, Michael Brenner, Julie Cattiau, Fernando Vieira, Maeve McNally, Taylor Charbonneau, Melissa Nollstadt, Avinatan Hassidim, Yossi Matias

An Investigation Into On-Device Personalization of End-to-End Automatic Speech Recognition Models (link to appear soon)
Khe Chai Sim, Petr Zadrazil, Francoise Beaufays

Salient Speech Representations Based on Cloned Networks
Bastiaan Kleijn, Felicia Lim, Michael Chinen, Jan Skoglund

Cross-Lingual Consistency of Phonological Features: An Empirical Study (link to appear soon)
Cibu Johny, Alexander Gutkin, Martin Jansche

LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
Heiga Zen, Viet Dang, Robert Clark, Yu Zhang, Ron Weiss, Ye Jia, Zhifeng Chen, Yonghui Wu

Improving Performance of End-to-End ASR on Numeric Sequences
Cal Peyser, Hao Zhang, Tara Sainath, Zelin Wu

Developing Pronunciation Models in New Languages Faster by Exploiting Common Grapheme-to-Phoneme Correspondences Across Languages (link to appear soon)
Harry Bleyan, Sandy Ritchie, Jonas Fromseier Mortensen, Daan van Esch

Phoneme-Based Contextualization for Cross-Lingual Speech Recognition in End-to-End Models
Ke Hu, Antoine Bruguier, Tara Sainath, Rohit Prabhavalkar, Golan Pundak

Fréchet Audio Distance: A Reference-free Metric for Evaluating Music Enhancement Algorithms
Kevin Kilgour, Mauricio Zuluaga, Dominik Roblek, Matthew Sharifi

Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning
Yu Zhang, Ron Weiss, Heiga Zen, Yonghui Wu, Zhifeng Chen, RJ Skerry-Ryan, Ye Jia, Andrew Rosenberg, Bhuvana Ramabhadran

Sampling from Stochastic Finite Automata with Applications to CTC Decoding
Martin Jansche, Alexander Gutkin

Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model (link to appear soon)
Anjuli Kannan, Arindrima Datta, Tara Sainath, Eugene Weinstein, Bhuvana Ramabhadran, Yonghui Wu, Ankur Bapna, Zhifeng Chen, SeungJi Lee

A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet
Jean-Marc Valin, Jan Skoglund

Low-Dimensional Bottleneck Features for On-Device Continuous Speech Recognition
David Ramsay, Kevin Kilgour, Dominik Roblek, Matthew Sharif

Unified Verbalization for Speech Recognition & Synthesis Across Languages (link to appear soon)
Sandy Ritchie, Richard Sproat, Kyle Gorman, Daan van Esch, Christian Schallhart, Nikos Bampounis, Benoit Brard, Jonas Mortensen, Amelia Holt, Eoin Mahon

Better Morphology Prediction for Better Speech Systems (link to appear soon)
Dravyansh Sharma, Melissa Wilson, Antoine Bruguier

Dual Encoder Classifier Models as Constraints in Neural Text Normalization
Ajda Gokcen, Hao Zhang, Richard Sproat

Large-Scale Visual Speech Recognition
Brendan Shillingford, Yannis Assael, Matthew Hoffman, Thomas Paine, Cían Hughes, Utsav Prabhu, Hank Liao, Hasim Sak, Kanishka Rao, Lorrayne Bennett, Marie Mulville, Ben Coppin, Ben Laurie, Andrew Senior, Nando de Freitas

Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation
Fadi Biadsy, Ron Weiss, Pedro Moreno, Dimitri Kanevsky, Ye Jia

Source: Google AI Blog

Natural Questions: a New Corpus and Challenge for Question Answering Research

Posted by Tom Kwiatkowski and Michael Collins, Research Scientists, Google AI Language

Open-domain question answering (QA) is a benchmark task in natural language understanding (NLU) that aims to emulate how people look for information, finding answers to questions by reading and understanding entire documents. Given a question expressed in natural language ("Why is the sky blue?"), a QA system should be able to read the web (such as this Wikipedia page) and return the correct answer, even if the answer is somewhat complicated and long. However, there are currently no large, publicly available sources of naturally occurring questions (i.e. questions asked by a person seeking information) and answers that can be used to train and evaluate QA models. This is because assembling a high-quality dataset for question answering requires a large source of real questions and significant human effort in finding correct answers.

To help spur research advances in QA, we are excited to announce Natural Questions (NQ), a new, large-scale corpus for training and evaluating open-domain question answering systems, and the first to replicate the end-to-end process in which people find answers to questions. NQ is large, consisting of 300,000 naturally occurring questions, along with human annotated answers from Wikipedia pages, to be used in training QA systems. We have additionally included 16,000 examples where answers (to the same questions) are provided by 5 different annotators, useful for evaluating the performance of the learned QA systems. Since answering the questions in NQ requires much deeper understanding than is needed to answer trivia questions — which are already quite easy for computers to solve — we are also announcing a challenge based on this data to help advance natural language understanding in computers.

The Data
NQ is the first dataset to use naturally occurring queries and focus on finding answers by reading an entire page, rather than extracting answers from a short paragraph. To create NQ, we started with real, anonymized, aggregated queries that users have posed to Google's search engine. We then ask annotators to find answers by reading through an entire Wikipedia page as they would if the question had been theirs. Annotators look for both long answers that cover all of the information required to infer the answer, and short answers that answer the question succinctly with the names of one or more entities. The quality of the annotations in the NQ corpus has been measured at 90% accuracy.

Our paper "Natural Questions: a Benchmark for Question Answering Research", which has been accepted for publication in Transactions of the Association for Computational Linguistics, has a full description of the data collection process. To see some more examples from the dataset, please check out the NQ website.

The Challenge
NQ is aimed at enabling QA systems to read and comprehend an entire Wikipedia article that may or may not contain the answer to the question. Systems will need to first decide whether the question is sufficiently well defined to be answerable — many questions make false assumptions or are just too ambiguous to be answered concisely. Then they will need to decide whether there is any part of the Wikipedia page that contains all of the information needed to infer the answer. We believe that the long answer identification task — finding all of the information required to infer an answer — requires a deeper level of language understanding than finding short answers once the long answers are known.

It is our hope that the release of NQ, and the associated challenge, will help spur the development of more effective and robust QA systems. We encourage the NLU community to participate and to help close the large gap between the performance of current state-of-the-art approaches and a human upper bound. Please visit the challenge website to view the leaderboard and learn more.

Source: Google AI Blog

Looking Back at Google’s Research Efforts in 2018

Posted by Jeff Dean, Senior Fellow and Google AI Lead, on behalf of the entire Google Research Community

2018 was an exciting year for Google's research teams, with our work advancing technology in many ways, including fundamental computer science research results and publications, the application of our research to emerging areas new to Google (such as healthcare and robotics), open source software contributions and strong collaborations with Google product teams, all aimed at providing useful tools and services. Below, we highlight just some of our efforts from 2018, and we look forward to what will come in the new year. For a more comprehensive look, please see our publications in 2018.

Ethical Principles and AI
Over the past few years, we have observed major advances in AI and the positive impact it can have on our products and the everyday lives of our billions of users. For those of us working in this field, we care deeply that AI is a force for good in the world, and that it is applied ethically, and to problems that are beneficial to society. This year we published the Google AI Principles, supported with a set of responsible AI practices outlining technical recommendations for implementation. In combination they provide a framework for us to evaluate our own development of AI, and we hope that other organizations can also use these principles to help shape their own thinking. It's important to note that because this field is evolving quite rapidly, best practices in some of the principles noted, such as "Avoid creating or reinforcing unfair bias" or "Be accountable to people", are also changing and improving as we and others conduct new research in areas like ML fairness and model interpretability. This research in turn leads to advances in our products to make them more inclusive and less biased, such as our work on reducing gender biases in Google Translate, and allows the exploration and release of more inclusive image datasets and models that enable computer vision to work for the diversity of global cultures. Furthermore, this work allows us to share best practices with the broader research community with the Fairness Module in the Machine Learning Crash Course.

AI for Social Good
The potential of AI to make dramatic impacts on many areas of social and societal importance is clear. One example of how AI can be applied to real-world problems is our work on flood prediction. In collaboration with many teams across Google, this research aims to provide accurate and timely fine-grained information about the likely extent and scope of flooding, enabling those in flood-prone regions to make better decisions about how best to protect themselves and their property.

A second example is our work on earthquake aftershock prediction, where we showed that a machine learning (ML) model can predict aftershock locations much more accurately than traditional physics-based models. Perhaps more importantly, because the ML model was designed to be interpretable, scientists have been able to make new discoveries about the behavior of aftershocks, leading to not only more accurate predictions, but also new levels of understanding.

We have also seen a huge number of external parties, sometimes in collaboration with Google researchers and engineers, using open source software like TensorFlow to tackle a wide range of scientific and social problems, such as using convolutional neural networks to identify humpback whale calls, detecting new exoplanets, identifying diseased cassava plants and more.

To spur creative activity in this area, we announced the Google AI for Social Impact Challenge in collaboration with Google.org, whereby individuals and organizations can receive grants from a total of $25M of funding, along with mentorship and advice from Google research scientists, engineers and other experts as they work to take a project with large potential social impact from idea to reality.

Assistive Technology
Much of our research centered on using ML and computer science to help our users accomplish things faster and more effectively. Often, these results in collaborations with various product teams to release the fruits of this research in various product features and settings. One example is Google Duplex, a system that requires research in natural language and dialogue understanding, speech recognition, text-to-speech, user understanding and effective UI design to all come together to enable an experience whereby a user can say "Can you book me a haircut at 4 PM today?", and a virtual agent will interact on your behalf over the telephone to handle the necessary details.

Other examples include Smart Compose, a tool that uses predictive models to give relevant suggestions about how to compose emails, making the process of email composition faster and easier, and Sound Search, a technology built on the Now Playing feature that enables you to discover what song is playing fast and accurately. Additionally, Smart Linkify in Android shows how we can use an on-device ML model to make many different kinds of text that appear on the screen of your phone more useful by understanding the kind of text you're selecting (e.g. knowing that something is an address, so we can offer a shortcut to a maps or direction link).

An important focus in our research is helping to make products like the Google Assistant support more languages and allow better understanding of semantic similarity, even when very different ways of expressing the same concept or idea are used. Underlying new product capabilities like these is research we performed on improving the quality of both speech synthesis and text-to-speech for languages without much training data available.

Quantum computing
Quantum computing is an emerging paradigm for computing that promises the ability to solve challenging problems that no classical computer can solve. We have been actively pursuing research in this area for the past several years, and we believe the field is on the cusp of demonstrating this capability for at least one problem (so-called quantum supremacy), which will be a watershed event for the field. Over the last year we produced a number of exciting new results, including the development of Bristlecone, a new 72-qubit quantum computing device, which scales the size of problems that can be tackled in quantum computers in the run-up towards quantum supremacy.

A Bristlecone chip being installed by Research Scientist Marissa Giustina at the Quantum AI Lab in Santa Barbara.

We also released Cirq, an open source programming framework for quantum computers, and explored how quantum computers could be used for neural networks. Finally, we shared our experience and techniques for understanding performance fluctuations in quantum processors, and shared some thoughts on how quantum computers might be useful as a computational substrate for neural networks. We're looking forward to exciting results in the quantum computing space in 2019!

Natural Language Understanding
Natural language research at Google had an exciting 2018, with a mix of basic research as well as product-focused collaborations. We developed improvements to our Transformer work from 2017, resulting in a new parallel-in-time version of the model called the Universal Transformer that shows strong gains across a number of natural language tasks including translation and linguistic reasoning. We also developed BERT, the first deeply bidirectional, unsupervised language representation, pre-trained using only a plain text corpus, that can then be fine-tuned on a wide variety of natural language tasks using transfer learning. BERT shows significant improvements over previous state-of-the-art results on 11 natural language tasks.

BERT also improves the state-of-the-art by 7.6% absolute on the very challenging GLUE benchmark, a set of 9 diverse Natural Language Understanding (NLU) tasks.

In addition to collaborating with various research teams to enable Smart Compose and Duplex (discussed previously), we worked to make the Google Assistant handle multilingual use cases better, with the goal of making the Assistant naturally conversational for all users.

Perception
Our perception research tackles the hard problems of allowing computers to understand images, sounds, music and video, as well as providing more powerful tools for image capture, compression, processing, creative expression, and augmented reality. In 2018, our technology improved Google Photos' ability to organize the content that users most care about, such as people and pets. Google Lens and the Assistant enabled users to learn about the natural world, answer questions in real-time, and do more with Lens in Google Images. A key aspect of the Google AI mission is to empower others to benefit from our technology, and we've made a lot of progress this year in improving capabilities and building blocks that are parts of Google APIs. Examples include improved and new capabilities in vision and video in Cloud ML APIs and face-related on-device building blocks through ML Kit.

Google Lens can help you learn more about the world around you. Here, Lens identifies the breed of this dog. Learn more in this blog post.

In 2018, our contributions to academic research included advances in deep learning for 3D scene understanding, such as stereo magnification, which enables synthesizing novel photorealistic views of a scene. Our ongoing research on better understanding images and video enables users to find, organize, enhance and improve images and video in Google products such as Photos, YouTube, Search and more. In 2018, notable advances included a fast bottom-up model for joint pose estimation and person instance segmentation, a system for visualizing complex motion, a system which models spatio-temporal relations between people and objects and improvements in video action recognition based on distillation and 3D convolutions.

In the audio domain, we proposed a method for unsupervised learning of semantic audio representations as well as significant improvements to expressive and human-like speech synthesis. Multimodal perception is an increasingly important research topic. Looking to Listen combines visual and auditory cues in an input video to isolate and enhance the speech of desired speakers in a video. This technology could support a range of applications, from speech enhancement and recognition in videos, through video conferencing, to improved hearing aids, especially in situations where multiple people are speaking.

Enabling perception on resource-constrained platforms has becoming increasingly important. MobileNetV2 is Google's next-generation mobile computer vision model and our MobileNets are used widely across academia and industry. MorphNet proposes an efficient method for learning the structure of deep networks that results in across-the-board performance improvements on image and audio models while respecting computational resource constraints, and more recent work on automatic generation of mobile network architectures demonstrates that even higher performance is possible.

Computational Photography
The improvements in quality and versatility of cell phone cameras over the last few years has been nothing short of remarkable. A modest part of this is improvements in the actual physical sensors used in phones, but a much greater part of it is due to advances in the scientific field of computational photography. Our research teams publish their new research techniques, and work closely with the Android and Consumer Hardware teams at Google to deliver this research into your hands in the latest Pixel and Android phones and other devices. In 2014, we introduced HDR+, a technique whereby the camera captures a burst of frames, aligns the frames in software, and merges them together with computational software. Originally in the HDR+ work, this was to enable pictures to have higher dynamic range than was possible with a single exposure. However, capturing a burst of frames and then performing computational analysis of these frames is a general approach that has enabled many advances in cameras in 2018. For example, it allowed the development of Motion Photos in Pixel 2 and the Augmented Reality mode in Motion Stills.

Motion photos on the Pixel 2 in Google Photos. For more examples, check out this Google Photos album.

Augmented chicken family with Motion Stills AR mode.

This year, one of our primary efforts in computational photography research was to create a new capability called Night Sight, which enables Pixel phone cameras to "see in the dark", earning praise by both press and users. Of course, Night Sight is just one of the new software-enabled camera features our teams have developed to help you take the perfect photo, including using ML to provide better portrait mode shots, seeing better and further with Super Res Zoom and capturing special moments with Top Shot and Google Clips.

Left: iPhone XS (full resolution image here). Right: Pixel 3 Night Sight (full resolution image here).

Algorithms and Theory
Algorithms are the backbone of Google systems and touch all our products, from routing algorithms behind Google trips to consistent hashing for Google cloud. Over the past year, we continued our research in algorithms and theory covering a wide range of areas from theoretical foundations to applied algorithms, and from graph mining to privacy-preserving computation. Our work in optimization spans areas from studying continuous optimization for machine learning to distributed combinatorial optimization. In the former area, our work on studying convergence of stochastic optimization algorithms for training neural networks (which won an ICLR 2018 Best Paper Award) exhibited issues with popular gradient-based optimization methods (such as some variants of ADAM), but provided a solid foundation for new gradient-based optimization methods.

Performance comparison of ADAM and AMSGRAD on a synthetic example of a simple one dimensional convex problem inspired by our examples of non-convergence. The first two plots (left and center) are for the online setting and the the last one (right) is for the stochastic setting.

In distributed optimization, we worked to improve the round and communication complexity of well-studied combinatorial optimization problems such as matchings in graphs via round compression and via core-sets, as well as submodular maximization, and k-core decomposition. On the more applied side, we developed algorithmic techniques for solving set cover at scale via sketching and for solving balanced partitioning and hierarchical clustering for graphs with trillions of edges. Our work on online delivery services was nominated for the best paper award at WWW'18. Finally, our open source optimization OR-tools platform won 4 gold medals at the 2018 Minizinc constraint programming competition.

In algorithmic choice theory, we have proposed new models and investigated the problems of reconstruction and learning a mixture of multinomial logits. We also studied the classes of functions learnable by neural networks and how to use machine-learned oracles to improve classic online algorithms.

Understanding learning techniques with strong privacy guarantees is of great importance for us at Google. In this context, we developed two new means of analyzing how differential privacy can be amplified by iteration and by shuffling. We also applied differential privacy techniques to design incentive-aware learning methods that are robust against gaming. Such learning techniques have applications in efficient online market design. Our new research in the area of market algorithms include also techniques to help advertisers test incentive compatibility of ad auctions, and optimizing ad refresh for in-app advertising. We also pushed the boundaries of state-of-the-art dynamic mechanisms for repeated auctions, and presented dynamic auctions that are robust against lack of prediction of future, against noisy forecasts, or against heterogenous buyer behaviour, and extend our results to dynamic double auctions. Finally, in the context of robustness in online optimization and online learning, we developed new online allocation algorithms for stochastic input with traffic spikes and new bandit algorithms robust to corrupted data.

Software Systems
A large part of our research on software systems continues to relate to building machine-learning models and to TensorFlow in particular. For example, we published on the design and implementation of dynamic control flow for TensorFlow 1.0. Some of our newer research introduces a system that we call Mesh TensorFlow, which makes it easy to specify large-scale distributed computations with model parallelism, sometimes with billions of parameters. As another example, we released a library for scalable deep neural ranking using TensorFlow.

The TF-Ranking library supports multi-item scoring architecture, an extension of traditional single-item scoring.

We also released JAX, an accelerator-backed variant of NumPy that supports automatic differentiation of Python functions to arbitrary order. While JAX is not part of TensorFlow, it leverages some of the same underlying software infrastructure (e.g. XLA), and some of its ideas and algorithms have been helpful to our TensorFlow projects. Finally, we continued our research on the security and privacy of machine learning, and our development of open source frameworks for safety and privacy in AI systems, such as CleverHans and TensorFlow Privacy.

Another important research direction for us is the application of ML to software systems, at many levels of the stack. For instance, we continued work on placement of computations onto devices, with a hierarchical model, and we contributed to learning memory access patterns. We also continued to explore how learned indices could be used to replace traditional index structures in database systems and storage systems. As I wrote last year, we believe that we are just scratching the surface in terms of the use of machine learning in computer systems.

The Hierarchical Planner's placement of a NMT (4-layer) model. White denotes CPU and the four colors each represent one of the GPUs. Note that every step of every layer is allocated across multiple GPUs. This placement is 53.7% faster than that generated by a human expert.

In 2018 we learned about Spectre and Meltdown, new classes of serious security vulnerabilities in modern computer processors, thanks to Google's Project Zero team in collaboration with others. These and related vulnerabilities will keep computer architecture researchers quite busy. In our continuing efforts to model CPU behavior, our Compiler Research team integrated their tool for measuring machine instruction latency and port pressure into LLVM, making possible better compilation decisions.

Google products, our Cloud offerings and inference for machine learning models depend critically on the ability to provide large-scale, reliable, efficient technical infrastructure for computing, storage and networking. A few research highlights from the past year include the evolution of Google's Software Defined Networking WAN, a stand-alone, federated query processing platform that executes SQL queries against data stored in different file-based formats, in many storage systems (BigTable, Spanner, Google Spreadsheets, etc.) and a report on our extensive use of code review, investigating the motivations behind code review at Google, current practices, and developers' satisfaction and challenges.

Running a large-scale web service such as content hosting, requires load balancing with stability in a dynamic environment. We developed a consistent hashing scheme with tight provable guarantees on the maximum load of each server, and deployed it for our cloud customers in Google Cloud Pub/Sub. After making an earlier version of our paper available, engineers at Vimeo found the paper, implemented and open sourced it in haproxy, and used it for their load balancing project at Vimeo. The results were dramatic: applying these algorithmic ideas helped them decrease the cache bandwidth by a factor of almost 8, eliminating a scaling bottleneck.

AutoML
AutoML, also known as meta-learning, is the use of machine learning to automate some aspects of machine learning. We have been performing research in this space for many years, and the long-term goal is to develop learning systems that can learn to take a new problem and solve it automatically, using insights and capabilities derived from other problems that have been previously solved. Our earlier work in this space has mostly used reinforcement learning, but we are also interested in the use of evolutionary algorithms. Last year we showed how evolutionary algorithms can be used to automatically discover state-of-the-art neural network architectures for a variety of visual tasks. We also explored how reinforcement learning can be applied to other problems than just neural network architecture search, showing that it can be used to 1) automatically generate image transformation sequences that improve the accuracy of a wide variety of image models, and 2) find new symbolic optimization expressions that are more effective than the commonly used optimization update rules. Our work on AdaNet showed how to have a fast and flexible AutoML algorithm with learning guarantees.

AdaNet adaptively growing an ensemble of neural networks. At each iteration, it measures the ensemble loss for each candidate, and selects the best one to move onto the next iteration.

Another focus for us was on automatically discovering neural network architectures that are computationally efficient, so that they can run in environments such as mobile phones or autonomous vehicles that have tight constraints on either computational resources or on inference time. For this, we showed that combining the accuracy of a model with its inference computation time in the reward function for a reinforcement learning architecture search can find models that are highly accurate while meeting particular performance constraints. We also explored using ML to learn to automatically compress ML models to have fewer parameters and use less computational resources.

TPUs
Tensor Processing Units (TPUs) are Google's internally-developed ML hardware accelerators, designed from the ground up to power both training and inference at scale. TPUs have enabled Google research breakthroughs such as BERT (discussed previously), and they also allow researchers around the world to build on Google research via open source and to pursue new breakthroughs of their own. For example, anyone can fine-tune BERT on TPUs for free via Colab, and the TensorFlow Research Cloud has given thousands of researchers the opportunity to benefit from even larger amounts of free Cloud TPU computing power. We've also made multiple generations of TPU hardware commercially available as Cloud TPUs, including ML supercomputers called Cloud TPU Pods that make large-scale ML training much more accessible. Internally, in addition to enabling faster advances in ML research, TPUs have driven major improvements across Google's core products, including Search, YouTube, Gmail, Google Assistant, Google Translate, and many others. We look forward to seeing ML teams both here at Google and elsewhere achieve even more with ML via the unprecedented computing scale that TPUs provide.

An individual TPU v3 device (left) and a portion of a TPU v3 Pod (right). TPU v3 is the latest generation of Google's Tensor Processing Unit (TPU) hardware. Available to external customers as Cloud TPU v3, these systems are liquid-cooled for maximum performance (computer chips + liquid = exciting!), and a full TPU v3 Pod can apply more than 100 petaflops of computational power to the world's largest ML problems.

Open Source Software and Datasets
Releasing open source software and the creation of new public datasets are two major ways that we contribute to the research and software engineering communities. One of our largest efforts in this space is TensorFlow, a widely popular system for ML computations that we released in November 2015. We celebrated TensorFlow's third birthday in 2018, and during this time, TensorFlow has been downloaded more than 30M times, with over 1700 contributors adding 45,000 commits. In 2018, TensorFlow had eight major releases and added major capabilities such as eager execution and distribution strategies. We launched public design reviews engaging the community in the development process, and we engaged contributors via special interest groups. With the launches of associated products such as TensorFlow Lite, TensorFlow.js and TensorFlow Probability, the TensorFlow ecosystem grew dramatically in 2018.

We are happy that TensorFlow has the strongest Github user retention of the top machine learning and deep learning frameworks. The TensorFlow team is also working to address Github issues faster and provide a smooth path for external contributors. In research, we continue to power much of the world's machine learning and deep learning research on a published paper basis according to Google Scholar data. TensorFlow Lite is now on more than 1.5B devices globally after being available for just one year. Additionally, TensorFlow.js is the number one ML framework for JavaScript; in the nine months since launch, it had over 2M Content Delivery Network (CDN) hits, 250K downloads and more than 10,000 stars on Github.

In addition to continued work on existing open source ecosystems, in 2018 we introduced a new framework for flexible and reproducible reinforcement learning, new visualization tools to rapidly understand the characteristics of a dataset (without needing to write any code), added a high-level library for expressing machine learning problems that involve learning-to-rank (the process of ordering a list of items in a way that maximizes the utility of the entire list, applicable across domains that include search engines, recommender systems, machine translation, dialogue systems and even computational biology), released a framework for fast and flexible AutoML solutions with learning guarantees, a library for doing in-browser realtime t-SNE visualizations using TensorFlow.js and added FHIR tools and software for working with electronic healthcare data (discussed in the healthcare section of this post).

Real-time evolution of the tSNE embedding for the complete MNIST dataset. The dataset contains images of 60,000 handwritten digits. You can find a live demo here.

Public datasets are often a great source of inspiration that lead to great progress across many fields, since they give the broader community both access to interesting data and problems as well as a healthy competitive drive to achieve better results on a variety of tasks. This year we were happy to release Google Dataset Search, a new tool for finding public datasets from all of the web. Over the years we have also curated and released many new, novel datasets, including everything from millions of general annotated images or videos, to a crowd-source Bengali dataset for speech recognition to robot arm grasping datasets and more. In 2018, we added even more datasets to that list.

Pictures from India & Singapore added to Open Images Extended using the Crowdsource app.

We released Open Images V4, a dataset containing 15.4M bounding-boxes for 600 categories on 1.9M images, as well as 30.1M human-verified image-level labels from 19,794 categories. We also extended this dataset to add more diversity of people and scenes from all over the world, by adding 5.5M generated annotations provided by tens of thousands of users from all over the world using crowdsource.google.com. We released the Atomic Visual Actions (AVA) dataset that provides audiovisual annotations of video for improving the state of the art in understanding human actions and speech in video. We also announced an updated YouTube-8M, and the 2nd YouTube-8M Large-Scale Video Understanding Challenge and Workshop. The HDR+ Burst Photography Dataset aims to enable a wide variety of research in the field of computational photography, and Google-Landmarks was a new dataset and challenge for landmark recognition. And while not a dataset release, we explored techniques that can enable faster creation of visual datasets using Fluid Annotation, an exploratory ML-powered interface for faster image annotation.

Visualization of the fluid annotation interface in action on image from COCO dataset. Image credit: gamene, original image.

From time-to-time, we also help establish new kinds of challenges for the research community, so that we can all work together on solving difficult research problems. Often these are done with the release of a new dataset, but not always. This year, we established new challenges around the Inclusive Images Challenge, to work towards making more robust models that are free from many kinds of biases, the iNaturalist 2018 Challenge which aims to enable computers' fine-grained discrimination of visual categories (such as species of plants in an image), a Kaggle "Quick, Draw!" Doodle Recognition Challenge to create a better classifier for the QuickDraw challenge game, and Conceptual Captions, a larger-scale image captioning dataset and challenge aimed at enabling better image captioning model research.

Robotics
In 2018, we made significant progress towards our goal of understanding how ML can teach robots how to act in the world, achieving a new milestone in the ability to teach robots to grasp novel objects (best systems paper at CoRL'18), and using it to learn about objects without human supervision. We've also made progress in learning robot motion by combining ML and sampling-based methods (best paper in service robotics at ICRA'18) and learning robot geometry for faster planning. We've made great strides in our ability to better perceive the structure of the world from autonomous observation. For the first time, we've been able to successfully train deep reinforcement learning models online on real robots, and are finding new, theoretically grounded ways, to learn stable approaches to robot control.

Applications of AI to Other Fields
In 2018, we have applied ML to a wide variety of problems in the physical and biological sciences. Using ML, we can supply scientists with the equivalent of hundreds or thousands of research assistants digging through data, which then frees the scientists to become more creative and productive.

Our Nature Methods paper on high-precision automated reconstruction of neurons proposed a new model that improves the accuracy of automated interpretation of connectomics data by an order of magnitude over previous deep learning techniques.

Our algorithm in action as it traces a single neurite in 3d in a songbird brain.

Some other examples of applying ML to science include:

A pre-trained TensorFlow model rates focus quality for a montage of microscope image patches of cells in Fiji (ImageJ). Hue and lightness of the borders denote predicted focus quality and prediction uncertainty, respectively.

Health
For the past several years, we have been applying ML to health, an area that affects every one of us, and is also one where we believe ML can make a tremendous difference by augmenting the intuitions and experience of healthcare professionals. Our general approach in this space is to collaborate with healthcare organizations to tackle basic research problems (using feedback from clinical experts to make our results more robust), and then publish the results in well-respected, peer-reviewed scientific and clinical journals. Once the research has been clinically and scientifically validated, we then conduct user and HCI research to understand how we can deploy this in real-world clinical settings. In 2018, we expanded our efforts across the broad space of computer-aided diagnostics to clinical task predictions as well.

At the end of 2016, we published work showing that a model trained to assess retinal fundus images for signs of diabetic retinopathy was able to perform on-par to slightly-better than U.S. medical-board-certified ophthalmologists at this task in a retrospective study. In 2018, we were able to show that by having the training images labeled by retinal specialists and by using an adjudicated protocol (where multiple retinal specialists convene and have to arrive at a single collective assessment for each fundus image), we could arrive at a model that is on-par with retinal specialists. Later, we published an evaluation that showed how pairing ophthalmologists and this ML model allow them to make more accurate decisions than either alone. We have deployed this diabetic retinopathy detection system in partnership with our Alphabet colleagues at Verily at over 10 sites including Aravind Eye Hospitals in India and at Rajavithi Hospital affiliated with the Ministry of Health in Thailand.

On the left is a retinal fundus image graded as having moderate DR ("Mo") by an adjudication panel of ophthalmologists (ground truth). On the top right is an illustration of the predicted scores ("N" = no DR, "Mi" = Mild DR, "Mo" = Moderate DR) from the model. On the bottom right is the set of scores given by physicians without assistance ("Unassisted") and those who saw the model's predictions ("Grades Only").

In work that medical and eye specialists found quite remarkable, we also published research on a machine learning model that can assess cardiovascular risk from retinal images. This shows early promising signs for a novel, non-invasive biomarker that can help clinicians better understand the health of their patients.

We have also continued our focus on pathology this year, showing how to improve the grading of prostate cancer using ML, detect metastatic breast cancer with deep learning, and developed a prototype for an augmented-reality microscope that can aid pathologists and other scientists by overlaying visual information derived from computer vision models into the visual field of the microscopist in real time.

For the past four years, we have had a significant research effort around using deep learning on electronic health records to make clinically-relevant predictions. In 2018, in collaboration with University of Chicago Medicine, UCSF and Stanford Medicine, we published work in Nature Digital Medicine showing how ML models applied to de-identified electronic medical records can make significantly higher accuracy predictions for a variety of clinically relevant tasks than the current clinical best practice. As part of this work, we developed tools to make it significantly easier to create these models even on quite different tasks and quite different underlying EHR data sets. We have open sourced software related to the Fast Healthcare Interoperability Resources (FHIR) standard that we developed in this work to help make working with medical data easier and more standardized (see this GitHub repository). We also improved the accuracy, speed and utility of our deep learning-based variant caller, DeepVariant. The team has forged ahead with partners and recently published the peer-reviewed paper in Nature Biotechnology.

When applying ML to historically-collected data, it's important to understand the populations that have experienced human and structural biases in the past and how those biases have been codified in the data. Machine-learning offers an opportunity to detect and address bias and to proactively advance health equity, which we are designing our systems to do.

Research Outreach
We interact with the external research community in many different ways, including faculty engagement and student support. We are proud to host hundreds of undergraduate, M.S. and Ph.D. students as interns during the academic year, as well as providing multi-year Ph.D. fellowships to students throughout North America, Europe, and the Middle East. In addition to financial support, each of the fellowship recipients is assigned one or more Google researchers as a mentor, and we bring together all the fellows for an annual Google Ph.D. Fellowship Summit, where they are exposed to state-of-the-art research being pursued at Google and given the opportunity to network with Google's researchers as well as other PhD Fellows from around the world.

Complementing this fellowship program is the Google AI Residency, a way of allowing people who want to learn to conduct deep learning research to spend a year working alongside and being mentored by researchers at Google. Now in its third year, residents are embedded in various teams across Google's global offices, pursuing research in areas such as machine learning, perception, algorithms and optimization, language understanding, healthcare and much more. With applications having just closed for the fourth year of this program, we are excited to see the research the new cohort of residents will pursue in 2019.

Each year, we also support a number of faculty members and students on research projects through our Google Faculty Research Awards program. In 2018, we also continued to host workshops at Google locations for faculty and graduate students in particular areas, including a workshop on AI/ML Research and Practice hosted in our Bangalore, India office, an Algorithms & Optimization Workshop hosted in our Zürich office, a workshop on healthcare applications of ML hosted in Sunnyvale and a workshop on Fairness and Bias in ML hosted in our Cambridge, MA office.

We believe that contributing openly to the broader research community is a critical part of supporting a healthy and productive research ecosystem. In addition to our open source and dataset releases, much of our research is published openly in top conference venues and journals, and we actively participate in the organization and sponsorship of conferences, all across the spectrum of different disciplines. For just a small sample, see our involvement at ICLR 2018, NAACL 2018, ICML 2018, CVPR 2018, NeurIPS 2018, ECCV 2018 and EMNLP 2018. Googlers also participated extensively in ASPLOS, HPCA, ICSE, IEEE Security & Privacy, OSDI, SIGCOMM, and many other conferences in 2018.

New Places, New Faces
In 2018, we were excited to welcome many new people with a wide range of backgrounds into our research organization. We announced our first AI research office in Africa, located in Accra, Ghana. We expanded our AI research presence in Paris, Tokyo and Amsterdam, and opened a research lab in Princeton. We continue to hire talented people into our offices all over the world, and you can learn more about joining our research efforts here.

Looking Forward to 2019
This blog post summarizes just a small fraction of the research performed in 2018. As we look back on 2018, we're excited (and proud!) of the breadth and depth of what we have accomplished. In 2019, we look forward to having even more impact on Google's direction and products, as well as on the broader research and engineering community!

Source: Google AI Blog

Exploring Quantum Neural Networks

Posted by Jarrod McClean, Senior Research Scientist and Hartmut Neven, Director of Engineering, Google AI Quantum Team

Since its inception, the Google AI Quantum team has pushed to understand the role of quantum computing in machine learning. The existence of algorithms with provable advantages for global optimization suggest that quantum computers may be useful for training existing models within machine learning more quickly, and we are building experimental quantum computers to investigate how intricate quantum systems can carry out these computations. While this may prove invaluable, it does not yet touch on the tantalizing idea that quantum computers might be able to provide a way to learn more about complex patterns in physical systems that conventional computers cannot in any reasonable amount of time.

Today we talk about two recent papers from the Google AI Quantum team that make progress towards understanding the power of quantum computers for learning tasks. The first constructs a quantum model of neural networks to investigate how a popular classification task might be carried out on quantum processors. In the second paper, we show how peculiar features of quantum geometry change the strategies for training these networks in comparison to their classical counterparts, and offer guidance towards more robust training of these networks.

In “Classification with Quantum Neural Networks on Near Term Processors”, we construct a model of quantum neural networks (QNNs) that is specifically designed to work on quantum processors that are expected to be available in the near term. While the current work is primarily theoretical, their structure facilitates implementation and testing on quantum computers in the immediate future. These QNNs can be adapted through supervised learning of labeled data, and we show that it is possible to train a QNN to classify images in the famous MNIST dataset. Follow up work in this area with larger quantum devices may pit the ability of quantum networks to learn patterns against popular classical networks.

Quantum Neural Network for classification. Here we depict a sample quantum neural network, where in contrast to hidden layers in classical deep neural networks, the boxes represent entangling actions, or “quantum gates”, on qubits. In a superconducting qubit setup this could be enacted through a microwave control pulse corresponding to each box.

In “Barren Plateaus in Quantum Neural Network Training Landscapes”, we focus on the training of quantum neural networks, and probe questions related to a key difficulty in classical neural networks, which is the problem of vanishing or exploding gradients. In conventional neural networks, a good unbiased initial guess for the neuron weights often involves randomization, although there can be some difficulties as well. Our paper shows that peculiar features of quantum geometry unequivocally prevent this from being a good strategy in the quantum case, instead taking you to barren plateaus. The implications of this work may guide future strategies for initializing and training quantum neural networks.

QNN vanishing gradient: concentration of measure in high dimensional spaces. In very high dimensional spaces, such as those explored by quantum computers, the vast majority of states counterintuitively sit near the equator of the hypersphere (left). This means that any smooth function on this space will tend to take a value very close to its mean with overwhelming probability when selected at random (right).

This research sets the stage for improvements in both the construction and training of quantum neural networks. In particular, experimental realizations of quantum neural networks using hardware at Google will enable rapid exploration of quantum neural networks in the near term. We hope that the insights from the geometry of these states will lead to new algorithms to train these networks that will be essential to unlocking their full potential.

Source: Google AI Blog

The NeurIPS 2018 Test of Time Award: The Trade-Offs of Large Scale Learning

Posted by Anna Ukhanova, Program Manager, Google AI Zürich

Progress in machine learning (ML) is happening so rapidly, that it can sometimes feel like any idea or algorithm more than 2 years old is already outdated or superseded by something better. However, old ideas sometimes remain relevant even when a large fraction of the scientific community has turned away from them. This is often a question of context: an idea which may seem to be a dead end in a particular context may become wildly successful in a different one. In the specific case of deep learning (DL), the growth of both the availability of data and computing power renewed interest in the area and significantly influenced research directions.

The NIPS 2008 paper “The Trade-Offs of Large Scale Learning” by Léon Bottou (then at NEC Labs, now at Facebook AI Research) and Olivier Bousquet (Google AI, Zürich) is a good example of this phenomenon. As the recent recipient of the NeurIPS 2018 Test of Time Award, this seminal work investigated the interplay between data and computation in ML, showing that if one is limited by computing power but can make use of a large dataset, it is more efficient to perform a small amount of computation on many individual training examples rather than to perform extensive computation on a subset of the data. This demonstrated the power of an old algorithm, stochastic gradient descent, which is nowadays used in pretty much all applications of DL.

Optimization and the Challenge of Scale
Many ML algorithms can be thought of as the combination of two main ingredients:

A model, which is a set of possible functions that will be used to fit the data.
An optimization algorithm which specifies how to find the best function in that set.

Back in the 90’s the datasets used in ML were much smaller than the ones in use today, and while artificial neural networks had already led to some successes, they were considered hard to train. In the early 2000’s, with the introduction of Kernel Machines (SVMs in particular), neural networks went out of fashion. Simultaneously, the attention shifted away from the optimization algorithms that had been used to train neural networks (stochastic gradient descent) to focus on those used for kernel machines (quadratic programming). One important difference being that in the former case, training examples are used one at a time to perform gradient steps (this is called “stochastic”), while in the latter case, all training examples are used at each iteration (this is called “batch”).

As the size of the training sets increased, the efficiency of optimization algorithms to handle large amounts of data became a bottleneck. For example, in the case of quadratic programming, running time scales at least quadratically in the number of examples. In other words, if you double your training set size, your training will take at least 4 times longer. Hence, lots of effort went into trying to make these algorithms scale to larger training sets (see for example Large Scale Kernel Machines).

People who had experience with training neural networks knew that stochastic gradient descent was comparably easier to scale to large datasets, but unfortunately its convergence is very slow (it takes lots of iterations to reach an accuracy comparable to that of a batch algorithm), so it wasn’t clear that this would be a solution to the scaling problem.

Stochastic Algorithms Scale Better
In the context of ML, the number of iterations needed to optimize the cost function is actually not the main concern: there is no point in perfectly tuning your model since you will essentially “overfit” to the training data. So why not reduce the computational effort that you put into tuning the model and instead spend the effort processing more data?

The work of Léon and Olivier provided a formal study of this phenomenon: by considering access to a large amount of data and assuming the limiting factor is computation, they showed that it is better to perform a minimal amount of computation on each individual training example (thus processing more of them) rather than performing extensive computation on a smaller amount of data.

In doing so, they also demonstrated that among various possible optimization algorithms, stochastic gradient descent is the best. This was confirmed by many experiments and led to a renewed interest in online optimization algorithms which are now in extensive use in ML.

Mysteries Remain
In the following years, many variants of stochastic gradient descent were developed both in the convex case and in the non-convex one (particularly relevant for DL). The most common variant now is the so-called “mini-batch” SGD where one considers a small number (~10-100) of training examples at each iteration, and performs several passes over the training set, with a couple of clever tricks to scale the gradient appropriately. Most ML libraries provide a default implementation of such an algorithm and it is arguably one of the pillars of DL.

While this analysis provided a solid foundation for understanding the properties of this algorithm, the amazing and sometimes surprising successes of DL continue to raise many more questions for the scientific community. In particular, the role of this algorithm in the generalization properties of deep networks has been repeatedly demonstrated but is still poorly understood. This means that a lot of fascinating questions are yet to be explored which could lead to a better understanding of the algorithms currently in use and the development of even more efficient algorithms in the future.

The perspective proposed by Léon and Olivier in their collaboration 10 years ago provided a significant boost to the development of the algorithm that is nowadays the workhorse of ML systems that benefit our lives daily, and we offer our sincere congratulations to both authors on this well-deserved award.

googblogs.com

All Google blogs and Press in one site

Tag Archives: Publications

Google at CVPR 2020

Source: Google AI Blog

Google at ICLR 2020

Source: Google AI Blog

Google Research: Looking Back at 2019, and Forward to 2020 and Beyond

Source: Google AI Blog

Google Research: Looking Back at 2019, and Forward to 2020 and Beyond

Source: Google AI Blog

Google at ICCV 2019

Source: Google AI Blog

Google at Interspeech 2019

Source: Google AI Blog

Natural Questions: a New Corpus and Challenge for Question Answering Research

Source: Google AI Blog

Looking Back at Google’s Research Efforts in 2018

Source: Google AI Blog

Exploring Quantum Neural Networks

Source: Google AI Blog

The NeurIPS 2018 Test of Time Award: The Trade-Offs of Large Scale Learning

Source: Google AI Blog