Tag Archives: translate

Tune in for the world’s first Google Translate music tour

Eleven years ago, Google Translate was created to break down language barriers. Since then, it has enabled billions of people and businesses all over the world to talk, connect and understand each other in new ways.

And we’re still re-imagining how it can be used—most recently, with music. The music industry in Sweden is one of the world's most successful exporters of hit music in English—with artists such Abba, The Cardigans and Avicii originating from the country. But there are still many talented Swedish artists who may not get the recognition or success they deserve except for in a small country up in the north.

This sparked an idea: might it be possible to use Google Translate with the sole purpose of breaking a Swedish band internationally?

Today, we’re presenting Translate Tour, in which up and coming Swedish indie pop group Vita Bergen will be using Google Translate to perform their new single “Tänd Ljusen” in three different languages—English, Spanish and French—on the streets of three different European cities. In just a couple of days, the band will set off to London, Paris and Madrid to sing their locally adapted songs in front of the eyes of the public—with the aim of spreading Swedish music culture and inviting people all over the world to tune into the band’s cross-European indie pop music.

Translate Tour 2_Credit Anton Olin.jpg — William Hellström from Vita Bergen will be performing his song in English, Spanish and French.

Last year Google Translate switched from phrase-based translation to Google Neural Machine Translation, which means that the tool now translates whole sentences at a time, rather than just piece by piece. It uses this broader context to figure out the most relevant translation, which it then rearranges and adjusts to be more like a human speaking with proper grammar.

Using this updated version of Google Translate, the English, Spanish and French translations of the song were close to flawless. The translations will also continue to improve, as the system learns from the more people using it.

Tune in to Vita Bergen’s release event, live streamed on YouTube today at 5:00 p.m. CEST, or listen to the songs in Swedish (“Tänd Ljusen”), English (“Light the Lights”), Spanish (“Enciende las Luces”) and French (“Allumez les Lumières”).

Source: Translate

Accelerating Deep Learning Research with the Tensor2Tensor Library

Posted by Łukasz Kaiser, Senior Research Scientist, Google Brain Team

Deep Learning (DL) has enabled the rapid advancement of many useful technologies, such as machine translation, speech recognition and object detection. In the research community, one can find code open-sourced by the authors to help in replicating their results and further advancing deep learning. However, most of these DL systems use unique setups that require significant engineering effort and may only work for a specific problem or architecture, making it hard to run new experiments and compare the results.

Today, we are happy to release Tensor2Tensor (T2T), an open-source system for training deep learning models in TensorFlow. T2T facilitates the creation of state-of-the art models for a wide variety of ML applications, such as translation, parsing, image captioning and more, enabling the exploration of various ideas much faster than previously possible. This release also includes a library of datasets and models, including the best models from a few recent papers (Attention Is All You Need, Depthwise Separable Convolutions for Neural Machine Translation and One Model to Learn Them All) to help kick-start your own DL research.

Translation Model	Training time	BLEU (difference from baseline)
Transformer (T2T)	3 days on 8 GPU	28.4 (+7.8)
SliceNet (T2T)	6 days on 32 GPUs	26.1 (+5.5)
GNMT + Mixture of Experts	1 day on 64 GPUs	26.0 (+5.4)
ConvS2S	18 days on 1 GPU	25.1 (+4.5)
GNMT	1 day on 96 GPUs	24.6 (+4.0)
ByteNet	8 days on 32 GPUs	23.8 (+3.2)
MOSES (phrase-based baseline)	N/A	20.6 (+0.0)

BLEU scores (higher is better) on the standard WMT English-German translation task.

As an example of the kind of improvements T2T can offer, we applied the library to machine translation. As you can see in the table above, two different T2T models, SliceNet and Transformer, outperform the previous state-of-the-art, GNMT+MoE. Our best T2T model, Transformer, is 3.8 points better than the standard GNMT model, which itself was 4 points above the baseline phrase-based translation system, MOSES. Notably, with T2T you can approach previous state-of-the-art results with a single GPU in one day: a small Transformer model (not shown above) gets 24.9 BLEU after 1 day of training on a single GPU. Now everyone with a GPU can tinker with great translation models on their own: our github repo has instructions on how to do that.

Modular Multi-Task Training
The T2T library is built with familiar TensorFlow tools and defines multiple pieces needed in a deep learning system: data-sets, model architectures, optimizers, learning rate decay schemes, hyperparameters, and so on. Crucially, it enforces a standard interface between all these parts and implements current ML best practices. So you can pick any data-set, model, optimizer and set of hyperparameters, and run the training to check how it performs. We made the architecture modular, so every piece between the input data and the predicted output is a tensor-to-tensor function. If you have a new idea for the model architecture, you don’t need to replace the whole setup. You can keep the embedding part and the loss and everything else, just replace the model body by your own function that takes a tensor as input and returns a tensor.

This means that T2T is flexible, with training no longer pinned to a specific model or dataset. It is so easy that even architectures like the famous LSTM sequence-to-sequence model can be defined in a few dozen lines of code. One can also train a single model on multiple tasks from different domains. Taken to the limit, you can even train a single model on all data-sets concurrently, and we are happy to report that our MultiModel, trained like this and included in T2T, yields good results on many tasks even when training jointly on ImageNet (image classification), MS COCO (image captioning), WSJ (speech recognition), WMT (translation) and the Penn Treebank parsing corpus. It is the first time a single model has been demonstrated to be able to perform all these tasks at once.

Built-in Best Practices
With this initial release, we also provide scripts to generate a number of data-sets widely used in the research community¹, a handful of models², a number of hyperparameter configurations, and a well-performing implementation of other important tricks of the trade. While it is hard to list them all, if you decide to run your model with T2T you’ll get for free the correct padding of sequences and the corresponding cross-entropy loss, well-tuned parameters for the Adam optimizer, adaptive batching, synchronous distributed training, well-tuned data augmentation for images, label smoothing, and a number of hyper-parameter configurations that worked very well for us, including the ones mentioned above that achieve the state-of-the-art results on translation and may help you get good results too.

As an example, consider the task of parsing English sentences into their grammatical constituency trees. This problem has been studied for decades and competitive methods were developed with a lot of effort. It can be presented as a sequence-to-sequence problem and be solved with neural networks, but it used to require a lot of tuning. With T2T, it took us only a few days to add the parsing data-set generator and adjust our attention transformer model to train on this problem. To our pleasant surprise, we got very good results in only a week:

Parsing Model	F1 score (higher is better)
Transformer (T2T)	91.3
Dyer et al.	91.7
Zhu et al.	90.4
Socher et al.	90.4
Vinyals & Kaiser et al.	88.3

Parsing F1 scores on the standard test set, section 23 of the WSJ. We only compare here models trained discriminatively on the Penn Treebank WSJ training set, see the paper for more results.

Contribute to Tensor2Tensor
In addition to exploring existing models and data-sets, you can easily define your own model and add your own data-sets to Tensor2Tensor. We believe the already included models will perform very well for many NLP tasks, so just adding your data-set might lead to interesting results. By making T2T modular, we also make it very easy to contribute your own model and see how it performs on various tasks. In this way the whole community can benefit from a library of baselines and deep learning research can accelerate. So head to our github repository, try the new models, and contribute your own!

Acknowledgements
The release of Tensor2Tensor was only possible thanks to the widespread collaboration of many engineers and researchers. We want to acknowledge here the core team who contributed (in alphabetical order): Samy Bengio, Eugene Brevdo, Francois Chollet, Aidan N. Gomez, Stephan Gouws, Llion Jones, Łukasz Kaiser, Nal Kalchbrenner, Niki Parmar, Ryan Sepassi, Noam Shazeer, Jakob Uszkoreit, Ashish Vaswani.

1 We include a number of datasets for image classification (MNIST, CIFAR-10, CIFAR-100, ImageNet), image captioning (MS COCO), translation (WMT with multiple languages including English-German and English-French), language modelling (LM1B), parsing (Penn Treebank), natural language inference (SNLI), speech recognition (TIMIT), algorithmic problems (over a dozen tasks from reversing through addition and multiplication to algebra) and we will be adding more and welcome your data-sets too.^↩

2 Including LSTM sequence-to-sequence RNNs, convolutional networks also with separable convolutions (e.g., Xception), recently researched models like ByteNet or the Neural GPU, and our new state-of-the-art models mentioned in this post that we will be actively updating in the repository.^↩

Source: Google Research Blog

Making the internet more inclusive in India

More than 400 million people in India use the internet, and more are coming online every day. But the vast majority of India’s online content is in English, which only 20 percent of the country’s population speaks—meaning most Indians have a hard time finding content and services in their language.

Building for everyone means first and foremost making things work in the languages people speak. That’s why we’ve now brought our new neural machine translation technology to translations between English and nine widely used Indian languages—Hindi, Bengali, Marathi, Gujarati, Punjabi, Tamil, Telugu, Malayalam and Kannada.

Neural machine translation translates full sentences at a time, instead of pieces of a sentence, using this broader context to help it figure out the most relevant translation. The result is higher-quality, more human sounding translations.

Just like it’s easier to learn a language when you already know a related language, our neural technology speaks each language better when it learns several at a time. For example, we have a whole lot more sample data for Hindi than its relatives Marathi and Bengali, but when we train them all together, the translations for all improve more than if we’d trained each individually.

NMT Translation India.jpg — Left: Phrase-based translation; right: neural machine translation

These improvements to Google Translate in India join several other updates we announced at an event in New Delhi today, including neutral machine translation in Chrome and bringing the Rajpal & Sons Hindi dictionary online so it’s easier for Hindi speakers to find word meanings right in search results. All these improvements help make the web more useful for hundreds of millions of Indians, and bring them closer to benefiting from the full value of the internet.

Source: Translate

Making the internet more inclusive in India

Source: Translate

Making the internet more inclusive in India

Source: The Official Google Blog

Making the internet more inclusive in India

Source: The Official Google Blog

Making the internet more inclusive in India

Source: Translate

Even better translations in Chrome, with one tap

Half the world’s webpages are in English, but less than 15 percent of the global population speaks it as a primary or secondary language. It’s no surprise that Chrome’s built-in Translate functionality is one of the most beloved Chrome features. Every day Chrome users translate more than 150 million webpages with just one click or tap.

Last year, Google Translate introduced neural machine translation, which uses deep neural networks to translate entire sentences, rather than just phrases, to figure out the most relevant translation. Since then we’ve been gradually making these improvements available for Chrome’s built-in translation for select language pairs. The result is higher-quality, full-page translations that are more accurate and easier to read.

Today, neural machine translation improvement is coming to Translate in Chrome for nine more language pairs. Neural machine translation will be used for most pages to and from English for Indonesian and eight Indian languages: Bengali, Gujarati, Kannada, Malayalam, Marathi, Punjabi, Tamil and Telugu. This means higher quality translations on pages containing everything from song lyrics to news articles to cricket discussions.

From left: A webpage in Indonesian; the page translated into English without neural machine translation; the page translated into English with neural machine translation. As you can see, the translations after neural machine translation are more fluid and natural.

The addition of these nine languages brings the total number of languages enabled with neural machine translations in Chrome to more than 20. You can already translate to and from English for Chinese, French, German, Hebrew, Hindi, Japanese, Korean, Portuguese, Thai, Turkish, Vietnamese, and one-way from Spanish to English.

We’ll bring neural machine translation to even more languages in the future. Until then, learn more about enabling Translate in Chrome in our help center.

Source: Google Chrome

Even better translations in Chrome, with one tap

We’ll bring neural machine translation to even more languages in the future. Until then, learn more about enabling Translate in Chrome in our help center.

Source: Translate

Even better translations in Chrome, with one tap

We’ll bring neural machine translation to even more languages in the future. Until then, learn more about enabling Translate in Chrome in our help center.

googblogs.com

All Google blogs and Press in one site

Tag Archives: translate

Tune in for the world’s first Google Translate music tour

Source: Translate

Accelerating Deep Learning Research with the Tensor2Tensor Library

Source: Google Research Blog

Making the internet more inclusive in India

Source: Translate

Making the internet more inclusive in India

Source: Translate

Making the internet more inclusive in India

Source: The Official Google Blog

Making the internet more inclusive in India

Source: The Official Google Blog

Making the internet more inclusive in India

Source: Translate

Even better translations in Chrome, with one tap

Source: Google Chrome

Even better translations in Chrome, with one tap

Source: Translate

Even better translations in Chrome, with one tap

Source: Translate