The Natural Language Decathlon

We introduce the Natural Language Decathlon (decaNLP), a challenge that spans ten NLP tasks. We cast all ten tasks as question answering and present a new multitask question answering network (MQAN) that jointly learns them without any task-specific modules or parameters.

Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, Richard Socher - June 20, 2018

A Multi-Discriminator CycleGAN for Unsupervised Non-Parallel Speech Domain Adaptation

We propose a novel generative adversarial network that uses multiple discriminators within a CycleGAN architecture. When evaluated on gender-based split domains, our model shows robust training and generates a more natural sound. It also improves ASR performance, by average 7.41% phoneme error rate on TIMIT, and 11.10% word error rate on Wall Street Journal datasets.

Ehsan Hosseini-Asl - April 5, 2018

A domain specific language for automated rnn architecture search

We propose a system that learns to generate candidate architectures in a domain specific language (DSL), ranks these candidate architectures based upon previous experience (our ranking function), and then compiles and evaluates a subset of these architectures which are then used to improve the overall system.

Martin Schrimpf, Stephen Merity - December 14, 2017

Improving end-to-end speech recognition models

We show that with improved regularization and policy learning, the performance of end-to-end speech models can be significantly improved. In particular, these techniques leads to 30% relative performance improvement over the baseline on the Wall Street Journal and LibriSpeech datasets.

Yingbo Zhou - December 14, 2017

Interpretable counting for visual question answering

We learn to answer free-form counting questions with an interpretable sequential decision process that enumerates objects in images

Alex Trott - December 14, 2017

Thinking out loud: hierarchical and interpretable multi-task reinforcement learning

We propose a hierarchical policy network which can reuse previously learned skills alongside and as subcomponents of new skills. It achieves this by discovering the underlying relations between skills.

Tianmin Shu, Caiming Xiong - December 14, 2017

Weighted transformer network for machine translation

We propose the Weighted Transformer, a Transformer with modified attention layers, that not only outperforms the baseline network in BLEU score but also converges 15 − 40% faster.

Karim Ahmed - November 8, 2017

Fully-parallel text generation for neural machine translation

We introduce a neural machine translation system that can produce an entire sentence at a time in a fully parallel way, overcoming a limitation of all current neural MT models. This means up to 10x lower user wait time, with similar translation quality to the best available word-by-word models.

Jiatao Gu, James Bradbury - November 07, 2017

How to talk to your database

Despite the meteoric rise in the popularity of relational databases, the ability to retrieve information from these databases is limited. This is due, in part, to the need for users to understand powerful but complex structured query languages. In this work, we provide a natural language interface to relational databases. Through this interface, users communicate directly with databases using natural language as opposed to through structured query languages such as SQL.

Victor Zhong - August 29, 2017

Learned in translation: contextualized word vectors

Most models in natural language processing (NLP) use pretrained word vectors to represent the meaning of individual words, but, because words rarely appear in isolation, all NLP models must learn how to understand words in context. Most learn to do so in isolation from other all other models. We teach a neural network how to contextualize words in the process of learning how to translate English to German. Then, we ask that neural network to provide context vectors (CoVe) to neural networks learning a second NLP task. In our experiments, networks trained with CoVe always achieve better performance.

Bryan McCann - July 31, 2017

Your tldr by an ai: a deep reinforced model for abstractive summarization

The last few decades have witnessed a fundamental change in the challenge of taking in new information. The bottleneck is no longer access to information; now it’s our ability to keep up. We all have to read more and more to keep up-to-date with our jobs, the news, and social media. We’ve looked at how AI can improve people’s work by helping with this information deluge and one potential answer is to have algorithms automatically summarize longer texts.

Romain Paulus - May 11, 2017

Learning when to skim and when to read

Do we always need human level accuracy in real world data? Or can we sometimes do with less? In this blog post we will explore how a fast baseline can decide which sentences are easy or difficult. By only using expensive classifiers on the difficult sentences we can save computational time.

Alexander Rosenberg Johansen - May 17, 2017

Knowing when to look: adaptive attention via a visual sentinel for image captioning

We introduce a novel adaptive attention encoder-decoder framework, a state of the art image-captioning deep learning model that significantly outperforms all existing systems on the COCO image captioning challenge data and Flickr30K.

Caiming Xiong - November 29, 2016

A way out of the odyssey: analyzing and combining recent insights for lstms

We propose and analyze a series of augmentations and modifications to LSTM networks resulting in improved performance for text classification datasets.

Shayne Longpre - November 16, 2016

Multiple different natural language processing tasks in a single deep model

We have developed a single deep neural network model which can learn five different natural language processing tasks. Our model achieves state-of-the-art results on syntactic chunking, dependency parsing, semantic relatedness, and textual entailment.

Kazuma Hashimoto - November 11, 2016

State of the art deep learning model for question answering

We introduce the Dynamic Coattention Network, a state of the art question answering deep learning model that significantly outperforms all existing systems on the Stanford Question Answering dataset.

Victor Zhong, Caiming Xiong - November 07, 2016

New neural network building block allows faster and more accurate text understanding

We published a new neural network building block, called a QRNN, that runs and trains much faster than traditional models due to better parallelism. Although our goal was speed, the new model also performs more accurately on every task we tried it on.

James Bradbury - November 07, 2016

Teaching neural networks to point to improve language modeling and translation

If you were a small child and wanted to ask what an object was but didn't have the vocabulary for it, how do you refer to it? By pointing at it! Surprisingly, neural networks can benefit from the same tactic as a five year old to improve on a variety of language tasks.

Stephen Merity - October 26, 2016

The wikitext long term dependency language modeling dataset

As part of our research into pointer sentinel mixture models, we've created and published the WikiText language modeling dataset produced using over 28k Wikipedia articles.

Stephen Merity - September 26, 2016

Metamind neural machine translation system for wmt 2016

We integrate promising recent developments in NMT, including subword splitting and back-translation for monolingual data augmentation, and introduce the Y-LSTM, a novel neural translation architecture.

James Bradbury - August 11, 2016

Dynamic memory networks for visual and textual question answering

Neural network architectures with memory and attention mechanisms exhibit certain reasoning capabilities required for question answering.

Caiming Xiong - April 04, 2016

New deep learning model understands and answers questions

We published new state of the art results on a variety of natural language processing (NLP) tasks. Our model, which we call the Dynamic Memory Network (DMN), combines two lines of recent work on memory and attention mechanisms in deep learning.

Ankit Kumar - June 25, 2015

We use cookies to make interactions with our websites and services easy and meaningful, to better understand how they are used and to tailor advertising. You can read more and make your cookie choices here. By continuing to use this site you are giving us your consent to do this.