Tag Archives: Publications

Google Brain Residency Program – 7 months in and looking ahead



“Beyond being incredibly instructive, the Google Brain Residency program has been a truly affirming experience. Working alongside people who truly love what they do--and are eager to help you develop your own passion--has vastly increased my confidence in my interests, my ability to explore them, and my plans for the near future.”
-Akosua Busia, B.S. Mathematical and Computational Science, Stanford University ‘16
2016 Google Brain Resident

In October 2015 we launched the Google Brain Residency, a 12-month program focused on jumpstarting a career for those interested in machine learning and deep learning research. This program is an opportunity to get hands on experience using the state-of-the-art infrastructure available at Google, and offers the chance to work alongside top researchers within the Google Brain team.

Our first group of residents arrived in June 2016, working with researchers on problems at the forefront of machine learning. The wide array of topics studied by residents reflects the diversity of the residents themselves — some come to the program as new graduates with degrees ranging from BAs to Ph.Ds in computer science to physics and mathematics to biology and neuroscience, while other residents come with years of industry experience under their belts. They all have come with a passion for learning how to conduct machine learning research.

The breadth of research being done by the Google Brain Team along with resident-mentorship pairing flexibility ensures that residents with interests in machine learning algorithms and reinforcement learning, natural language understanding, robotics, neuroscience, genetics and more, are able to find good mentors to help them pursue their ideas and publish interesting work. And just seven months into the program, the Residents are already making an impact in the research field.

To date, Google Brain Residents have submitted a total of 21 papers to leading machine learning conferences, spanning topics from enhancing low resolution images to building neural networks that in turn design novel, task specific neural network architectures. Of those 21 papers, 5 were accepted in the recent BayLearn Conference (two of which, “Mean Field Neural Networks” and “Regularizing Neural Networks by Penalizing Their Output Distribution’’, were presented in oral sessions), 2 were accepted in the NIPS 2016 Adversarial Training workshop, and another in ISMIR 2016 (see the full list of papers, including the 14 submissions to ICLR 2017, after the figures below).
An LSTM Cell (Left) and a state of the art RNN Cell found using a neural network (Right). This is an example of a novel architecture found using the approach presented in “Neural Architecture Search with Reinforcement Learning” (B. Zoph and Q. V. Le, submitted to ICLR 2017). This paper uses a neural network to generate novel RNN cell architectures that outperform the widely used LSTM on a variety of different tasks.
The training accuracy for neural networks, colored from black (random chance) to red (high accuracy). Overlaid in white dashed lines are the theoretical predictions showing the boundary between trainable and untrainable networks. (a) Networks with no dropout. (b)-(d) Networks with dropout rates of 0.01, 0.02, 0.06 respectively. This research explores whether theoretical calculations can replace large hyperparameter searches. For more details, read “Deep Information Propagation” (S. S. Schoenholz, J. Gilmer, S. Ganguli, J. Sohl-Dickstein, submitted to ICLR 2017).

Accepted conference papers
(Google Brain Residents marked with asterisks)

Unrolled Generative Adversarial Networks
Luke Metz*, Ben Poole, David Pfau, Jascha Sohl-Dickstein
NIPS 2016 Adversarial Training Workshop (oral presentation)

Conditional Image Synthesis with Auxiliary Classifier GANs
Augustus Odena*, Chris Olah, Jon Shlens
NIPS 2016 Adversarial Training Workshop (oral presentation)

Regularizing Neural Networks by Penalizing Their Output Distribution
Gabriel Pereyra*, George Tucker, Lukasz Kaiser, Geoff Hinton
BayLearn 2016 (oral presentation)

Mean Field Neural Networks
Samuel S. Schoenholz*, Justin Gilmer*, Jascha Sohl-Dickstein
BayLearn 2016 (oral presentation)

Learning to Remember
Aurko Roy, Ofir Nachum*, Łukasz Kaiser, Samy Bengio
BayLearn 2016 (poster session)

Towards Generating Higher Resolution Images with Generative Adversarial Networks
Augustus Odena*, Jonathon Shlens
BayLearn 2016 (poster session)

Multi-Task Convolutional Music Models
Diego Ardila, Cinjon Resnick*, Adam Roberts, Douglas Eck
BayLearn 2016 (poster session)

Audio DeepDream: Optimizing Raw Audio With Convolutional Networks
Diego Ardila, Cinjon Resnick*, Adam Roberts, Douglas Eck
ISMIR 2016 (poster session)

Papers under review (Google Brain Residents marked with asterisks)

Learning to Remember Rare Events
Lukasz Kaiser, Ofir Nachum*, Aurko Roy, Samy Bengio
Submitted to ICLR 2017

Neural Combinatorial Optimization with Reinforcement Learning
Irwan Bello*, Hieu Pham*, Quoc V. Le, Mohammad Norouzi, Samy Bengio
Submitted to ICLR 2017

HyperNetworks
David Ha*, Andrew Dai, Quoc V. Le
Submitted to ICLR 2017

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Noam Shazeer, Azalia Mirhoseini*, Krzysztof Maziarz, Quoc Le, Jeff Dean
Submitted to ICLR 2017

Neural Architecture Search with Reinforcement Learning
Barret Zoph* and Quoc Le
Submitted to ICLR 2017

Deep Information Propagation
Samuel Schoenholz*, Justin Gilmer*, Surya Ganguli, Jascha Sohl-Dickstein
Submitted to ICLR 2017

Capacity and Learnability in Recurrent Neural Networks
Jasmine Collins*, Jascha Sohl-Dickstein, David Sussillo
Submitted to ICLR 2017

Unrolled Generative Adversarial Networks
Luke Metz*, Ben Poole, David Pfau, Jascha Sohl-Dickstein
Submitted to ICLR 2017

Conditional Image Synthesis with Auxiliary Classifier GANs
Augustus Odena*, Chris Olah, Jon Shlens
Submitted to ICLR 2017

Generating Long and Diverse Responses with Neural Conversation Models
Louis Shao, Stephan Gouws, Denny Britz*, Anna Goldie, Brian Strope, Ray Kurzweil
Submitted to ICLR 2017

Intelligible Language Modeling with Input Switched Affine Networks
Jakob Foerster, Justin Gilmer*, Jan Chorowski, Jascha Sohl-dickstein, David Sussillo
Submitted to ICLR 2017

Regularizing Neural Networks by Penalizing Confident Output Distributions
Gabriel Pereyra*, George Tucker*, Jan Chorowski, Lukasz Kaiser, Geoffrey Hinton
Submitted to ICLR 2017

Unsupervised Perceptual Rewards for Imitation Learning
Pierre Sermanet, Kelvin Xu*, Sergey Levine
Submitted to ICLR 2017

Improving policy gradient by exploring under-appreciated rewards
Ofir Nachum*, Mohammad Norouzi, Dale Schuurmans
Submitted to ICLR 2017

Protein Secondary Structure Prediction Using Deep Multi-scale Convolutional Neural Networks and Next-Step Conditioning
Akosua Busia*, Jasmine Collins*, Navdeep Jaitly

The diverse and collaborative atmosphere fostered by the Brain team has resulted in a group of researchers making great strides on a wide range of research areas which we are excited to share with the broader community. We look forward to even more innovative research that is yet to be done from our 2016 residents, and are excited for the program to continue into it’s second year!

We are currently accepting applications for the 2017 Google Brain Residency Program. To learn more about the program and to submit your application, visit g.co/brainresidency. Applications close January 13th, 2017.

Equality of Opportunity in Machine Learning



As machine learning technology progresses rapidly, there is much interest in understanding its societal impact. A particularly successful branch of machine learning is supervised learning. With enough past data and computational resources, learning algorithms often produce surprisingly effective predictors of future events. To take one hypothetical example: an algorithm could, for example, be used to predict with high accuracy who will pay back their loan. Lenders might then use such a predictor as an aid in deciding who should receive a loan in the first place. Decisions based on machine learning can be both incredibly useful and have a profound impact on our lives.

Even the best predictors make mistakes. Although machine learning aims to minimize the chance of a mistake, how do we prevent certain groups from experiencing a disproportionate share of these mistakes? Consider the case of a group that we have relatively little data on and whose characteristics differ from those of the general population in ways that are relevant to the prediction task. As prediction accuracy is generally correlated with the amount of data available for training, it is likely that incorrect predictions will be more common in this group. A predictor might, for example, end up flagging too many individuals in this group as ‘high risk of default’ even though they pay back their loan. When group membership coincides with a sensitive attribute, such as race, gender, disability, or religion, this situation can lead to unjust or prejudicial outcomes.

Despite the need, a vetted methodology in machine learning for preventing this kind of discrimination based on sensitive attributes has been lacking. A naive approach might require a set of sensitive attributes to be removed from the data before doing anything else with it. This idea of “fairness through unawareness,” however, fails due to the existence of “redundant encodings.” Even if a particular attribute is not present in the data, combinations of other attributes can act as a proxy.

Another common approach, called demographic parity, asks that the prediction must be uncorrelated with the sensitive attribute. This might sound intuitively desirable, but the outcome itself is often correlated with the sensitive attribute. For example, the incidence of heart failure is substantially more common in men than in women. When predicting such a medical condition, it is therefore neither realistic nor desirable to prevent all correlation between the predicted outcome and group membership.

Equal Opportunity

Taking these conceptual difficulties into account, we’ve proposed a methodology for measuring and preventing discrimination based on a set of sensitive attributes. Our framework not only helps to scrutinize predictors to discover possible concerns. We also show how to adjust a given predictor so as to strike a better tradeoff between classification accuracy and non-discrimination if need be.

At the heart of our approach is the idea that individuals who qualify for a desirable outcome should have an equal chance of being correctly classified for this outcome. In our fictional loan example, it means the rate of ‘low risk’ predictions among people who actually pay back their loan should not depend on a sensitive attribute like race or gender. We call this principle equality of opportunity in supervised learning.

When implemented, our framework also improves incentives by shifting the cost of poor predictions from the individual to the decision maker, who can respond by investing in improved prediction accuracy. Perfect predictors always satisfy our notion, showing that the central goal of building more accurate predictors is well aligned with the goal of avoiding discrimination.

Learn more

To explore the ideas in this blog post on your own, our Big Picture team created a beautiful interactive visualization of the different concepts and tradeoffs. So, head on over to their page to learn more.

Once you’ve walked through the demo, please check out the full version of our paper, a joint work with Eric Price (UT Austin) and Nati Srebro (TTI Chicago). We’ll present the paper at this year’s Conference on Neural Information Processing Systems (NIPS) in Barcelona. So, if you’re around, be sure to stop by and chat with one of us.

Our paper is by no means the final word on this important and complex topic. It joins an ongoing conversation with a multidisciplinary focus of research. We hope to inspire future research that will sharpen the discussion of the different achievable tradeoffs surrounding discrimination and machine learning, as well as the development of tools that will help practitioners address these challenges.

ACL 2016 & Research at Google



This week, Berlin hosts the 2016 Annual Meeting of the Association for Computational Linguistics (ACL 2016), the premier conference of the field of computational linguistics, covering a broad spectrum of diverse research areas that are concerned with computational approaches to natural language. As a leader in Natural Language Processing (NLP) and a Platinum Sponsor of the conference, Google will be on hand to showcase research interests that include syntax, semantics, discourse, conversation, multilingual modeling, sentiment analysis, question answering, summarization, and generally building better learners using labeled and unlabeled data, state-of-the-art modeling, and learning from indirect supervision.

Our systems are used in numerous ways across Google, impacting user experience in search, mobile, apps, ads, translate and more. Our work spans the range of traditional NLP tasks, with general-purpose syntax and semantic algorithms underpinning more specialized systems.
Our researchers are experts in natural language processing and machine learning, and combine methodological research with applied science, and our engineers are equally involved in long-term research efforts and driving immediate applications of our technology.

If you’re attending ACL 2016, we hope that you’ll stop by the booth to check out some demos, meet our researchers and discuss projects and opportunities at Google that go into solving interesting problems for billions of people. Learn more about Google research being presented at ACL 2016 below (Googlers highlighted in blue), and visit the Natural Language Understanding Team page at g.co/NLUTeam.

Papers
Generalized Transition-based Dependency Parsing via Control Parameters
Bernd Bohnet, Ryan McDonald, Emily Pitler, Ji Ma

Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation Learning
Yulia Tsvetkov, Manaal Faruqui, Wang Ling (Google DeepMind), Chris Dyer (Google DeepMind)

Morpho-syntactic Lexicon Generation Using Graph-based Semi-supervised Learning (TACL)
Manaal Faruqui, Ryan McDonald, Radu Soricut

Many Languages, One Parser (TACL)
Waleed Ammar, George Mulcaire, Miguel Ballesteros, Chris Dyer (Google DeepMind)*, Noah A. Smith

Latent Predictor Networks for Code Generation
Wang Ling (Google DeepMind), Phil Blunsom (Google DeepMind), Edward Grefenstette (Google DeepMind), Karl Moritz Hermann (Google DeepMind), Tomáš Kočiský (Google DeepMind), Fumin Wang (Google DeepMind), Andrew Senior (Google DeepMind)

Collective Entity Resolution with Multi-Focal Attention
Amir Globerson, Nevena Lazic, Soumen Chakrabarti, Amarnag Subramanya, Michael Ringgaard, Fernando Pereira

Plato: A Selective Context Model for Entity Resolution (TACL)
Nevena Lazic, Amarnag Subramanya, Michael Ringgaard, Fernando Pereira

WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia
Daniel Hewlett, Alexandre Lacoste, Llion Jones, Illia Polosukhin, Andrew Fandrianto, Jay Han, Matthew Kelcey, David Berthelot

Stack-propagation: Improved Representation Learning for Syntax
Yuan Zhang, David Weiss

Cross-lingual Models of Word Embeddings: An Empirical Comparison
Shyam Upadhyay, Manaal Faruqui, Chris Dyer (Google DeepMind)Dan Roth

Globally Normalized Transition-Based Neural Networks (Outstanding Papers Session)
Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro Presta, Kuzman GanchevSlav Petrov, Michael Collins

Posters
Cross-lingual projection for class-based language models
Beat Gfeller, Vlad Schogol, Keith Hall

Synthesizing Compound Words for Machine Translation
Austin Matthews, Eva Schlinger*, Alon Lavie, Chris Dyer (Google DeepMind)*

Cross-Lingual Morphological Tagging for Low-Resource Languages
Jan Buys, Jan A. Botha

Workshops
1st Workshop on Representation Learning for NLP
Keynote Speakers include: Raia Hadsell (Google DeepMind)
Workshop Organizers include: Edward Grefenstette (Google DeepMind), Phil Blunsom (Google DeepMind), Karl Moritz Hermann (Google DeepMind)
Program Committee members include: Tomáš Kočiský (Google DeepMind), Wang Ling (Google DeepMind), Ankur Parikh (Google), John Platt (Google), Oriol Vinyals (Google DeepMind)

1st Workshop on Evaluating Vector-Space Representations for NLP
Contributed Papers:
Problems With Evaluation of Word Embeddings Using Word Similarity Tasks
Manaal Faruqui, Yulia Tsvetkov, Pushpendre Rastogi, Chris Dyer (Google DeepMind)*

Correlation-based Intrinsic Evaluation of Word Vector Representations
Yulia Tsvetkov, Manaal Faruqui, Chris Dyer (Google DeepMind)

SIGFSM Workshop on Statistical NLP and Weighted Automata
Contributed Papers:
Distributed representation and estimation of WFST-based n-gram models
Cyril Allauzen, Michael Riley, Brian Roark

Pynini: A Python library for weighted finite-state grammar compilation
Kyle Gorman


* Work completed at CMU

CVPR 2016 & Research at Google



This week, Las Vegas hosts the 2016 Conference on Computer Vision and Pattern Recognition (CVPR 2016), the premier annual computer vision event comprising the main conference and several co-located workshops and short courses. As a leader in computer vision research, Google has a strong presence at CVPR 2016, with many Googlers presenting papers and invited talks at the conference, tutorials and workshops.

We congratulate Google Research Scientist Ce Liu and Google Faculty Advisor Abhinav Gupta, who were selected as this year’s recipients of the PAMI Young Researcher Award for outstanding research contributions within computer vision. We also congratulate Googler Henrik Stewenius for receiving the Longuet-Higgins Prize, a retrospective award that recognizes up to two CVPR papers from ten years ago that have made a significant impact on computer vision research, for his 2006 CVPR paper “Scalable Recognition with a Vocabulary Tree”, co-authored with David Nister.

If you are attending CVPR this year, please stop by our booth and chat with our researchers about the projects and opportunities at Google that go into solving interesting problems for hundreds of millions of people. The Google booth will also showcase sveral recent efforts, including the technology behind Motion Stills and a live demo of neural network-based image compression. Learn more about our research being presented at CVPR 2016 in the list below (Googlers highlighted in blue).

Oral Presentations
Generation and Comprehension of Unambiguous Object Descriptions
Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, Alan L. Yuille, Kevin Murphy

Detecting Events and Key Actors in Multi-Person Videos
Vignesh Ramanathan, Jonathan Huang, Sami Abu-El-Haija, Alexander Gorban, Kevin Murphy, Li Fei-Fei

Spotlight Session: 3D Reconstruction
DeepStereo: Learning to Predict New Views From the World’s Imagery
John Flynn, Ivan Neulander, James Philbin, Noah Snavely

Posters
Discovering the Physical Parts of an Articulated Object Class From Multiple Videos
Luca Del Pero, Susanna Ricco, Rahul Sukthankar, Vittorio Ferrari

Blockout: Dynamic Model Selection for Hierarchical Deep Networks
Calvin Murdock, Zhen Li, Howard Zhou, Tom Duerig

Rethinking the Inception Architecture for Computer Vision
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, Zbigniew Wojna

Improving the Robustness of Deep Neural Networks via Stability Training
Stephan Zheng, Yang Song, Thomas Leung, Ian Goodfellow

Semantic Image Segmentation With Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform
Liang-Chieh Chen, Jonathan T. Barron, George Papandreou, Kevin Murphy, Alan L. Yuille

Tutorial
Optimization Algorithms for Subset Selection and Summarization in Large Data Sets
Ehsan Elhamifar, Jeff Bilmes, Alex Kulesza, Michael Gygli

Workshops
Perceptual Organization in Computer Vision: The Role of Feedback in Recognition and Reorganization
Organizers: Katerina Fragkiadaki, Phillip Isola, Joao Carreira
Invited talks: Viren Jain, Jitendra Malik

VQA Challenge Workshop
Invited talks: Jitendra Malik, Kevin Murphy

Women in Computer Vision
Invited talk: Caroline Pantofaru

Computational Models for Learning Systems and Educational Assessment
Invited talk: Jonathan Huang

Large-Scale Scene Understanding (LSUN) Challenge
Invited talk: Jitendra Malik

Large Scale Visual Recognition and Retrieval: BigVision 2016
General Chairs: Jason Corso, Fei-Fei Li, Samy Bengio

ChaLearn Looking at People
Invited talk: Florian Schroff

Medical Computer Vision
Invited talk: Ramin Zabih

Bringing Precision to the AI Safety Discussion



We believe that AI technologies are likely to be overwhelmingly useful and beneficial for humanity. But part of being a responsible steward of any new technology is thinking through potential challenges and how best to address any associated risks. So today we’re publishing a technical paper, Concrete Problems in AI Safety, a collaboration among scientists at Google, OpenAI, Stanford and Berkeley.

While possible AI safety risks have received a lot of public attention, most previous discussion has been very hypothetical and speculative. We believe it’s essential to ground concerns in real machine learning research, and to start developing practical approaches for engineering AI systems that operate safely and reliably.

We’ve outlined five problems we think will be very important as we apply AI in more general circumstances. These are all forward thinking, long-term research questions -- minor issues today, but important to address for future systems:

  • Avoiding Negative Side Effects: How can we ensure that an AI system will not disturb its environment in negative ways while pursuing its goals, e.g. a cleaning robot knocking over a vase because it can clean faster by doing so?
  • Avoiding Reward Hacking: How can we avoid gaming of the reward function? For example, we don’t want this cleaning robot simply covering over messes with materials it can’t see through.
  • Scalable Oversight: How can we efficiently ensure that a given AI system respects aspects of the objective that are too expensive to be frequently evaluated during training? For example, if an AI system gets human feedback as it performs a task, it needs to use that feedback efficiently because asking too often would be annoying.
  • Safe Exploration: How do we ensure that an AI system doesn’t make exploratory moves with very negative repercussions? For example, maybe a cleaning robot should experiment with mopping strategies, but clearly it shouldn’t try putting a wet mop in an electrical outlet.
  • Robustness to Distributional Shift: How do we ensure that an AI system recognizes, and behaves robustly, when it’s in an environment very different from its training environment? For example, heuristics learned for a factory workfloor may not be safe enough for an office.

We go into more technical detail in the paper. The machine learning research community has already thought quite a bit about most of these problems and many related issues, but we think there’s a lot more work to be done.

We believe in rigorous, open, cross-institution work on how to build machine learning systems that work as intended. We’re eager to continue our collaborations with other research groups to make positive progress on AI.

ICML 2016 & Research at Google



This week, New York hosts the 2016 International Conference on Machine Learning (ICML 2016), a premier annual Machine Learning event supported by the International Machine Learning Society (IMLS). Machine Learning is a key focus area at Google, with highly active research groups exploring virtually all aspects of the field, including deep learning and more classical algorithms.

We work on an extremely wide variety of machine learning problems that arise from a broad range of applications at Google. One particularly important setting is that of large-scale learning, where we utilize scalable tools and architectures to build machine learning systems that work with large volumes of data that often preclude the use of standard single-machine training algorithms. In doing so, we are able to solve deep scientific problems and engineering challenges, exploring theory as well as application, in areas of language, speech, translation, music, visual processing and more.

As Gold Sponsor, Google has a strong presence at ICML 2016 with many Googlers publishing their research and hosting workshops. If you’re attending, we hope you’ll visit the Google booth and talk with our researchers to learn more about the exciting work, creativity and fun that goes into solving interesting ML problems that impact millions of people. You can also learn more about our research being presented at ICML 2016 in the list below (Googlers highlighted in blue).

ICML 2016 Organizing Committee
Area Chairs include: Corinna Cortes, John Blitzer, Maya Gupta, Moritz Hardt, Samy Bengio

IMLS
Board Members include: Corinna Cortes

Accepted Papers
ADIOS: Architectures Deep In Output Space
Moustapha Cisse, Maruan Al-Shedivat, Samy Bengio

Associative Long Short-Term Memory
Ivo Danihelka, Greg Wayne, Benigno Uria, Nal Kalchbrenner, Alex Graves

Asynchronous Methods for Deep Reinforcement Learning
Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu

Binary embeddings with structured hashed projections
Anna Choromanska, Krzysztof Choromanski, Mariusz Bojarski, Tony Jebara, Sanjiv Kumar, Yann LeCun

Discrete Distribution Estimation Under Local Privacy
Peter Kairouz, Keith Bonawitz, Daniel Ramage

Dueling Network Architectures for Deep Reinforcement Learning (Best Paper Award recipient)
Ziyu Wang, Nando de Freitas, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot

Exploiting Cyclic Symmetry in Convolutional Neural Networks
Sander Dieleman, Jeffrey De Fauw, Koray Kavukcuoglu

Fast Constrained Submodular Maximization: Personalized Data Summarization
Baharan Mirzasoleiman, Ashwinkumar Badanidiyuru, Amin Karbasi

Greedy Column Subset Selection: New Bounds and Distributed Algorithms
Jason Altschuler, Aditya Bhaskara, Gang Fu, Vahab Mirrokni, Afshin Rostamizadeh, Morteza Zadimoghaddam

Horizontally Scalable Submodular Maximization
Mario Lucic, Olivier Bachem, Morteza Zadimoghaddam, Andreas Krause

Continuous Deep Q-Learning with Model-based Acceleration
Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, Sergey Levine

Meta-Learning with Memory-Augmented Neural Networks
Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, Timothy Lillicrap

One-Shot Generalization in Deep Generative Models
Danilo Rezende, Shakir Mohamed, Daan Wierstra

Pixel Recurrent Neural Networks (Best Paper Award recipient)
Aaron Van den Oord, Nal Kalchbrenner, Koray Kavukcuoglu

Pricing a low-regret seller
Hoda Heidari, Mohammad Mahdian, Umar Syed, Sergei Vassilvitskii, Sadra Yazdanbod

Primal-Dual Rates and Certificates
Celestine Dünner, Simone Forte, Martin Takac, Martin Jaggi

Recommendations as Treatments: Debiasing Learning and Evaluation
Tobias Schnabel, Thorsten Joachims, Adith Swaminathan, Ashudeep Singh, Navin Chandak

Recycling Randomness with Structure for Sublinear Time Kernel Expansions
Krzysztof Choromanski, Vikas Sindhwani

Train faster, generalize better: Stability of stochastic gradient descent
Moritz Hardt, Ben Recht, Yoram Singer

Variational Inference for Monte Carlo Objectives
Andriy Mnih, Danilo Rezende

Workshops
Abstraction in Reinforcement Learning
Organizing Committee: Daniel Mankowitz, Timothy Mann, Shie Mannor
Invited Speaker: David Silver

Deep Learning Workshop
Organizers: Antoine Bordes, Kyunghyun Cho, Emily Denton, Nando de Freitas, Rob Fergus
Invited Speaker: Raia Hadsell

Neural Networks Back To The Future
Organizers: Léon Bottou, David Grangier, Tomas Mikolov, John Platt

Data-Efficient Machine Learning
Organizers: Marc Deisenroth, Shakir Mohamed, Finale Doshi-Velez, Andreas Krause, Max Welling

On-Device Intelligence
Organizers: Vikas Sindhwani, Daniel Ramage, Keith Bonawitz, Suyog Gupta, Sachin Talathi
Invited Speakers: Hartwig Adam, H. Brendan McMahan

Online Advertising Systems
Organizing Committee: Sharat Chikkerur, Hossein Azari, Edoardo Airoldi
Opening Remarks: Hossein Azari
Invited Speakers: Martin Pál, Todd Phillips

Anomaly Detection 2016
Organizing Committee: Nico Goernitz, Marius Kloft, Vitaly Kuznetsov

Tutorials
Deep Reinforcement Learning
David Silver

Rigorous Data Dredging: Theory and Tools for Adaptive Data Analysis
Moritz Hardt, Aaron Roth

Quantum annealing with a digital twist



One of the key benefits of quantum computing is that it has the potential to solve some of the most complex problems in nature, from physics to chemistry to biology. For example, when attempting to calculate protein folding, or when exploring reaction catalysts and “designer” molecules, one can look at computational challenges as optimization problems, and represent the different configurations of a molecule as an energy landscape in a quantum computer. By letting the system cool, or “anneal”, one finds the lowest energy state in the landscape - the most stable form of the molecule. Thanks to the peculiarities of quantum mechanics, the correct answer simply drops out at the end of the quantum computation. In fact, many tough problems can be dealt with this way, this combination of simplicity and generality makes it appealing.

But finding the lowest energy state in a system is like being put in the Alps, and being told to find the lowest elevation - it’s easy to get stuck in a “local” valley, and not know that there is an even lower point elsewhere. Therefore, we use a different approach: We start with a very simple energy landscape - a flat meadow - and initialize the system of quantum bits (qubits) to represent the known lowest energy point, or “ground state”, in that landscape. We then begin to adjust the simple landscape towards one that represents the problem we are trying to solve - from the smooth meadow to the highly uneven terrain of the Alps. Here’s the fun part: if one evolves the landscape very slowly, the ground state of the qubits also evolves, so that they stay in the ground state of the changing system. This is called “adiabatic quantum computing”, and qubits exploit quantum tunneling to ensure they always find the lowest energy "valley" in the changing system.

While this is great in theory, getting this to work in practice is challenging, as you have to set up the energy landscape using the available qubit interactions. Ideally you’d have multiple interactions going on between all of the qubits, but for a large-scale solver the requirements to accurately keep track of these interactions become enormous. Realistically, the connectivity has to be reduced, but this presents a major limitation for the computational possibilities.

In "Digitized adiabatic quantum computing with a superconducting circuit", published in Nature, we’ve overcome this obstacle by giving quantum annealing a digital twist. With a limited connectivity between qubits you can still construct any of the desired interactions: Whether the interaction is ferromagnetic (the quantum bits prefer an aligned) or antiferromagnetic (anti-aligned orientation), or even defined along an arbitrary different direction, you can make it happen using easy to combine discrete building blocks. In this case, the blocks we use are the logic gates that we've been developing with our superconducting architecture.
Superconducting quantum chip with nine qubits. Each qubit (cross-shaped structures in the center) is connected to its neighbors and individually controlled. Photo credit: Julian Kelly.
The key is controllability. Qubits, like other physical objects in nature, have a resonance frequency, and can be addressed individually with short voltage and current pulses. In our architecture we can steer this frequency, much like you would tune a radio to a broadcast. We can even tune one qubit to the frequency of another one. By moving qubit frequencies to or away from each other, interactions can be turned on or off. The exchange of quantum information resembles a relay race, where the baton can be handed down when the runners meet.

You can see the algorithm in action below. Any problem is encoded as local “directions” we want qubits to point to - like a weathervane pointing into the wind - and interactions, depicted here as links between the balls. We start by aligning all qubits into the same direction, and the interactions between the qubits turned off - this is the simplest ground state of the system. Next, we turn on interactions and change qubit directions to start evolving towards the energy landscape we wish to solve. The algorithmic steps are implemented with many control pulses, illustrating how the problem gets solved in a giant dance of quantum entanglement.
Top: Depiction of the problem, with the gold arrows in the blue balls representing the directions we’d like each qubit to align to, like a weathervane pointing to the wind. The thickness of the link between the balls indicates the strength of the interaction - red denotes a ferromagnetic link, and blue an antiferromagnetic link. Middle: Implementation with qubits (yellow crosses) with control pulses (red) and steering the frequency (vertical direction). Qubits turn blue when there is interaction. The qubits turn green when they are being measured. Bottom: Zoom in of the physical device, showing the corresponding nine qubits (cross-shaped).
To run the adiabatic quantum computation efficiently and design a set of test experiments we teamed up with the QUTIS group at the University of the Basque Country in Bilbao, Spain, led by Prof. E. Solano and Dr. L. Lamata, who are experts in synthesizing digital algorithms. It’s the largest digital algorithm to date, with up to nine qubits and using over one thousand logic gates.

The crucial advantage for the future is that this digital implementation is fully compatible with known quantum error correction techniques, and can therefore be protected from the effects of noise. Otherwise, the noise will set a hard limit, as even the slightest amount can derail the state from following the fragile path to the solution. Since each quantum bit and interaction element can add noise to the system, some of the most important problems are well beyond reach, as they have many degrees of freedom and need a high connectivity. But with error correction, this approach becomes a general-purpose algorithm which can be scaled to an arbitrarily large quantum computer.

Helping webmasters re-secure their sites



Every week, over 10 million users encounter harmful websites that deliver malware and scams. Many of these sites are compromised personal blogs or small business pages that have fallen victim due to a weak password or outdated software. Safe Browsing and Google Search protect visitors from dangerous content by displaying browser warnings and labeling search results with ‘this site may harm your computer’. While this helps keep users safe in the moment, the compromised site remains a problem that needs to be fixed.

Unfortunately, many webmasters for compromised sites are unaware anything is amiss. Worse yet, even when they learn of an incident, they may lack the security expertise to take action and address the root cause of compromise. Quoting one webmaster from a survey we conducted, “our daily and weekly backups were both infected” and even after seeking the help of a specialist, after “lots of wasted hours/days” the webmaster abandoned all attempts to restore the site and instead refocused his efforts on “rebuilding the site from scratch”.

In order to find the best way to help webmasters clean-up from compromise, we recently teamed up with the University of California, Berkeley to explore how to quickly contact webmasters and expedite recovery while minimizing the distress involved. We’ve summarized our key lessons below. The full study, which you can read here, was recently presented at the International World Wide Web Conference.

When Google works directly with webmasters during critical moments like security breaches, we can help 75% of webmasters re-secure their content. The whole process takes a median of 3 days. This is a better experience for webmasters and their audience.

How many sites get compromised?
Number of freshly compromised sites Google detects every week.
Over the last year Google detected nearly 800,000 compromised websites—roughly 16,500 new sites every week from around the globe. Visitors to these sites are exposed to low-quality scam content and malware via drive-by downloads. While browser and search warnings help protect visitors from harm, these warnings can at times feel punitive to webmasters who learn only after-the-fact that their site was compromised. To balance the safety of our users with the experience of webmasters, we set out to find the best approach to help webmasters recover from security breaches and ultimately reconnect websites with their audience.

Finding the most effective ways to aid webmasters
  1. Getting in touch with webmasters: One of the hardest steps on the road to recovery is first getting in contact with webmasters. We tried three notification channels: email, browser warnings, and search warnings. For webmasters who proactively registered their site with Search Console, we found that email communication led to 75% of webmasters re-securing their pages. When we didn’t know a webmaster’s email address, browser warnings and search warnings helped 54% and 43% of sites clean up respectively.
  2. Providing tips on cleaning up harmful content: Attackers rely on hidden files, easy-to-miss redirects, and remote inclusions to serve scams and malware. This makes clean-up increasingly tricky. When we emailed webmasters, we included tips and samples of exactly which pages contained harmful content. This, combined with expedited notification, helped webmasters clean up 62% faster compared to no tips—usually within 3 days.
  3. Making sure sites stay clean: Once a site is no longer serving harmful content, it’s important to make sure attackers don’t reassert control. We monitored recently cleaned websites and found 12% were compromised again in 30 days. This illustrates the challenge involved in identifying the root cause of a breach versus dealing with the side-effects.
Making security issues less painful for webmasters—and everyone

We hope that webmasters never have to deal with a security incident. If you are a webmaster, there are some quick steps you can take to reduce your risk. We’ve made it easier to receive security notifications through Google Analytics as well as through Search Console. Make sure to register for both services. Also, we have laid out helpful tips for updating your site’s software and adding additional authentication that will make your site safer.

If you’re a hosting provider or building a service that needs to notify victims of compromise, understand that the entire process is distressing for users. Establish a reliable communication channel before a security incident occurs, make sure to provide victims with clear recovery steps, and promptly reply to inquiries so the process feels helpful, not punitive.

As we work to make the web a safer place, we think it’s critical to empower webmasters and users to make good security decisions. It’s easy for the security community to be pessimistic about incident response being ‘too complex’ for victims, but as our findings demonstrate, even just starting a dialogue can significantly expedite recovery.

An Update on fast Transit Routing with Transfer Patterns



What is the best way to get from A to B by public transit? Google Maps is answering such queries for over 20,000 cities and towns in over 70 countries around the world, including large metro areas like New York, São Paulo or Moscow, and some complete countries, such as Japan or Great Britain.
Since its beginnings in 2005 with the single city of Portland, Oregon, the number of cities and countries served by Google’s public transit directions has been growing rapidly. With more and larger regions, the amount of data we need to search in order to provide optimal directions has grown as well. In 2010, the search speed of transit directions made a leap ahead of that growth and became fast enough to update the result while you drag the endpoints. The technique behind that speed-up is the Transfer Patterns algorithm [1], which was created at Google’s engineering office in Zurich, Switzerland, by visiting researcher Hannah Bast and a number of Google engineers.

I am happy to report that this research collaboration has continued and expanded with the Google Focused Research Award on Next-Generation Route Planning. Over the past three years, this grant has supported Hannah Bast’s research group at the University of Freiburg, as well as the research groups of Peter Sanders and Dorothea Wagner at the Karlsruhe Institute of Technology (KIT).

From the project’s numerous outcomes, I’d like to highlight two recent ones that re-examine the Transfer Patterns approach and massively improve it for continent-sized networks: Scalable Transfer Patterns [2] and Frequency-Based Search for Public Transit [3] by Hannah Bast, Sabine Storandt and Matthias Hertel. This blogpost presents the results from these publications.

The notion of a transfer pattern is easy to understand. Suppose you are at a transit stop downtown, call it A, and want to go to some stop B as quickly as possible. Suppose further you brought a printed schedule book but no smartphone. (This sounded plausible only a few years ago!) As a local, you might know that there are only two reasonable options:
  1. Take a tram from A to C, then transfer at C to a bus to B.
  2. Take the direct bus from A to B, which only runs infrequently.
We say the first option has transfer pattern A-C-B, and the second option has transfer pattern A-B. Notice that no in-between stops are mentioned. This is very compact information, much less than the actual schedules, but it makes looking up the schedules significantly faster: Knowing that all optimal trips follow one of these patterns, you only need to look at those lines in the schedule book that provide direct connections from A to C, C to B and A to B. All other lines can safely be ignored: you know you will not miss a better option.

While the basic idea of transfer patterns is indeed that simple, it takes more to make it work in practice. The transfer patterns of all optimal trips have to be computed ahead of time and stored, so that they are available to answer queries. Conceptually, we need transfer patterns for every pair of stops, because any pair could come up in a query. It is perfectly reasonable to compute them for all pairs within one city, or even one metro area that is densely criss-crossed by a transit network comprising, say, a thousand stops, yielding a million of pairs to consider.

As the scale of the problem increases from one metro area to an entire country or continent, this “all pairs” approach rapidly becomes expensive: ten thousand stops (10x more than above) already yield a hundred million pairs (100x more than above), and so on. Also, the individual transfer patterns become quite repetitive: For example, from any stop in Paris, France to any stop in Cologne, Germany, all optimal connections end up using the same few long-distance train lines in the middle, only the local connections to the railway stations depend on the specific pair of stops considered.

However, designated long-distance connections are not the only way to travel between different local networks – they also overlap and connect to each other. For mid-range trips, there is no universally correct rule when to choose a long-distance train or intercity bus, short of actually comparing options with local or regional transit, too.

The Scalable Transfer Patterns algorithm [2] does just that, but in a smart way. For starters, it uses what is known as graph clustering to cut the network into pieces, called clusters, that have a lot of connections inside but relatively few to the outside. As an example, the figure below (kindly provided by the authors) shows a partitioning of Germany into clusters. The stops highlighted in red are border stops: They connect directly to stops outside the cluster. Notice how they are a small fraction of the total network.
The public transit network of Germany (dots and lines), split into clusters (shown in various colors). Of all 251,763 stops, only 10,886 (4.32%) are boundary stops, highlighted as red boxes. Click here to view the full resolution image.[source: S. Storandt, 2016]
Based on the clustering, the transfer patterns of all optimal connections are computed in two steps.

In step 1, transfer patterns are computed for optimal connections inside each cluster. They are stored for query processing later on, but they also accelerate the search through a cluster in the following step: between the stops on its border, we only need to consider the connections captured in the transfer patterns.

The next figure sketches how the transit network in the cluster around Berlin gets reduced to much fewer connections between border stations. (The central station stands out as a hub, as expected. It is a border station itself, because it has direct connections out of the cluster.)
The cluster of public transit connections around Berlin (shown as dots and lines in light blue), its border stops (highlighted as red boxes), and the transfer patterns of optimal connections between border stops (thick black lines; only the most important 111 of 592 are shown to keep the image legible). This cuts out 96.15% of the network (especially a lot of the high-frequency inner city trips, which makes the time savings even bigger). Click here to view the full resolution image. [source: S. Storandt, 2016]
In step 2, transfer patterns can be computed for the entire network, that is, between any pair of clusters. This is done with the following twists:

  • It suffices to consider trips from and to boundary stops of any cluster; the local transfer patterns from step 1 will supply the missing pieces later on.
  • The per-cluster transfer patterns from step 1 greatly accelerate the search across other clusters.
  • The search stops exploring any possible connection between two boundary stops as soon as it gets worse than a connection that sticks to long-distance transit between clusters (which may not always be optimal, but is always quick to compute).

The results of steps 1 and 2 are stored and used to answer queries. For any given query from some A to some B, one can now easily stitch together a network of transfer patterns that covers all optimal connections from A to B. Looking up the direct connections on that small network (like in the introductory example) and finding the best one for the queried time is very fast, even if A and B are far away.

The total storage space needed for this is much smaller than the space that would be needed for all pairs of stops, all the more the larger the network gets. Extrapolating from their experiments, the researchers estimate [2] that Scalable Transfer Patterns for the whole world could be stored in 30 GB, cutting their estimate for the basic Transfer Patterns by a thousand(!). This is considerably more powerful than the “hub station” idea from the original Transfer Patterns paper [1].

The time needed to compute Scalable Transfer Patterns is also estimated to shrink by three orders of magnitude: At a high level, the earlier phases of the algorithm accelerate the later ones, as described above. At a low level, a second optimization technique kicks in: exploiting the repetitiveness of schedules in time. Recall that finding transfer patterns is all about finding the optimal connections between pairs of stops at any possible departure time.

Frequency-based schedules (e.g., one bus every 10 minutes) cause a lot of similarity during the day, although it often doesn’t match up between lines (e.g., said bus runs every 10 minutes before 6pm and every 20 minutes after, and we seek connections to a train that runs every 12 minutes before 8pm and every 30 minutes after). Moreover, this similarity also exists from one day to the next, and we need to consider all permissible departure dates.

The Frequency-Based Search for Public Transit [3] is carefully designed to find and take advantage of repetitive schedules while representing all one-off cases exactly. Comparing to the set-up from the original Transfer Patterns paper [1], the authors estimate a whopping 60x acceleration of finding transfer patterns from this part alone.

I am excited to see that the scalability questions originally left open by [1] have been answered so convincingly as part of this Focused Research Award. Please see the list of publications on the project’s website for more outcomes of this award. Besides more on transfer patterns, they contain a wealth of other results about routing on road networks, transit networks, and with combinations of travel modes.

References:

[1] Fast Routing in Very Large Public Transportation Networks Using Transfer Patterns
by H. Bast, E. Carlsson, A. Eigenwillig, R. Geisberger, C. Harrelson, V. Raychev and F. Viger
(ESA 2010). [doi]

[2] Scalable Transfer Patterns
by H. Bast, M. Hertel and S. Storandt (ALENEX 2016). [doi]

[3] Frequency-based Search for Public Transit
by H. Bast and S. Storandt (SIGSPATIAL 2014). [doi]

When can Quantum Annealing win?



During the last two years, the Google Quantum AI team has made progress in understanding the physics governing quantum annealers. We recently applied these new insights to construct proof-of-principle optimization problems and programmed these into the D-Wave 2X quantum annealer that Google operates jointly with NASA. The problems were designed to demonstrate that quantum annealing can offer runtime advantages for hard optimization problems characterized by rugged energy landscapes.

We found that for problem instances involving nearly 1000 binary variables, quantum annealing significantly outperforms its classical counterpart, simulated annealing. It is more than 108 times faster than simulated annealing running on a single core. We also compared the quantum hardware to another algorithm called Quantum Monte Carlo. This is a method designed to emulate the behavior of quantum systems, but it runs on conventional processors. While the scaling with size between these two methods is comparable, they are again separated by a large factor sometimes as high as 108.
Time to find the optimal solution with 99% probability for different problem sizes. We compare Simulated Annealing (SA), Quantum Monte Carlo (QMC) and D-Wave 2X. Shown are the 50, 75 and 85 percentiles over a set of 100 instances. We observed a speedup of many orders of magnitude for the D-Wave 2X quantum annealer for this optimization problem characterized by rugged energy landscapes. For such problems quantum tunneling is a useful computational resource to traverse tall and narrow energy barriers.
While these results are intriguing and very encouraging, there is more work ahead to turn quantum enhanced optimization into a practical technology. The design of next generation annealers must facilitate the embedding of problems of practical relevance. For instance, we would like to increase the density and control precision of the connections between the qubits as well as their coherence. Another enhancement we wish to engineer is to support the representation not only of quadratic optimization, but of higher order optimization as well. This necessitates that not only pairs of qubits can interact directly but also larger sets of qubits. Our quantum hardware group is working on these improvements which will make it easier for users to input hard optimization problems. For higher-order optimization problems, rugged energy landscapes will become typical. Problems with such landscapes stand to benefit from quantum optimization because quantum tunneling makes it easier to traverse tall and narrow energy barriers.

We should note that there are algorithms, such as techniques based on cluster finding, that can exploit the sparse qubit connectivity in the current generation of D-Wave processors and still solve our proof-of-principle problems faster than the current quantum hardware. But due to the denser connectivity of next generation annealers, we expect those methods will become ineffective. Also, in our experience we find that lean stochastic local search techniques such as simulated annealing are often the most competitive for hard problems with little structure to exploit. Therefore, we regard simulated annealing as a generic classical competition that quantum annealing needs to beat. We are optimistic that the significant runtime gains we have found will carry over to commercially relevant problems as they occur in tasks relevant to machine intelligence.

For details please refer to http://arxiv.org/abs/1512.02206.