We introduce the Natural Language Decathlon (decaNLP), a challenge that spans ten NLP tasks. We cast all ten tasks as question answering and present a new multitask question answering network (MQAN) that jointly learns them without any task-specific modules or parameters.
Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, Richard Socher - June 20, 2018
We propose a novel generative adversarial network that uses multiple discriminators within a CycleGAN architecture. When evaluated on gender-based split domains, our model shows robust training and generates a more natural sound. It also improves ASR performance, by average 7.41% phoneme error rate on TIMIT, and 11.10% word error rate on Wall Street Journal datasets.
Ehsan Hosseini-Asl - April 5, 2018
We propose a system that learns to generate candidate architectures in a domain specific language (DSL), ranks these candidate architectures based upon previous experience (our ranking function), and then compiles and evaluates a subset of these architectures which are then used to improve the overall system.
Martin Schrimpf, Stephen Merity - December 14, 2017
We show that with improved regularization and policy learning, the performance of end-to-end speech models can be significantly improved. In particular, these techniques leads to 30% relative performance improvement over the baseline on the Wall Street Journal and LibriSpeech datasets.
Yingbo Zhou - December 14, 2017
We learn to answer free-form counting questions with an interpretable sequential decision process that enumerates objects in images
Alex Trott - December 14, 2017
We propose a hierarchical policy network which can reuse previously learned skills alongside and as subcomponents of new skills. It achieves this by discovering the underlying relations between skills.
Tianmin Shu, Caiming Xiong - December 14, 2017
We propose the Weighted Transformer, a Transformer with modified attention layers, that not only outperforms the baseline network in BLEU score but also converges 15 − 40% faster.
Karim Ahmed - November 8, 2017
We introduce a neural machine translation system that can produce an entire sentence at a time in a fully parallel way, overcoming a limitation of all current neural MT models. This means up to 10x lower user wait time, with similar translation quality to the best available word-by-word models.
Jiatao Gu, James Bradbury - November 07, 2017
Despite the meteoric rise in the popularity of relational databases, the ability to retrieve information from these databases is limited. This is due, in part, to the need for users to understand powerful but complex structured query languages. In this work, we provide a natural language interface to relational databases. Through this interface, users communicate directly with databases using natural language as opposed to through structured query languages such as SQL.
Victor Zhong - August 29, 2017
Most models in natural language processing (NLP) use pretrained word vectors to represent the meaning of individual words, but, because words rarely appear in isolation, all NLP models must learn how to understand words in context. Most learn to do so in isolation from other all other models. We teach a neural network how to contextualize words in the process of learning how to translate English to German. Then, we ask that neural network to provide context vectors (CoVe) to neural networks learning a second NLP task. In our experiments, networks trained with CoVe always achieve better performance.
Bryan McCann - July 31, 2017
The last few decades have witnessed a fundamental change in the challenge of taking in new information. The bottleneck is no longer access to information; now it’s our ability to keep up. We all have to read more and more to keep up-to-date with our jobs, the news, and social media. We’ve looked at how AI can improve people’s work by helping with this information deluge and one potential answer is to have algorithms automatically summarize longer texts.
Romain Paulus - May 11, 2017
Do we always need human level accuracy in real world data? Or can we sometimes do with less? In this blog post we will explore how a fast baseline can decide which sentences are easy or difficult. By only using expensive classifiers on the difficult sentences we can save computational time.
Alexander Rosenberg Johansen - May 17, 2017
We introduce a novel adaptive attention encoder-decoder framework, a state of the art image-captioning deep learning model that significantly outperforms all existing systems on the COCO image captioning challenge data and Flickr30K.
Caiming Xiong - November 29, 2016
We propose and analyze a series of augmentations and modifications to LSTM networks resulting in improved performance for text classification datasets.
Shayne Longpre - November 16, 2016
We have developed a single deep neural network model which can learn five different natural language processing tasks. Our model achieves state-of-the-art results on syntactic chunking, dependency parsing, semantic relatedness, and textual entailment.
Kazuma Hashimoto - November 11, 2016
We introduce the Dynamic Coattention Network, a state of the art question answering deep learning model that significantly outperforms all existing systems on the Stanford Question Answering dataset.
Victor Zhong, Caiming Xiong - November 07, 2016
We published a new neural network building block, called a QRNN, that runs and trains much faster than traditional models due to better parallelism. Although our goal was speed, the new model also performs more accurately on every task we tried it on.
James Bradbury - November 07, 2016
If you were a small child and wanted to ask what an object was but didn't have the vocabulary for it, how do you refer to it? By pointing at it! Surprisingly, neural networks can benefit from the same tactic as a five year old to improve on a variety of language tasks.
Stephen Merity - October 26, 2016
As part of our research into pointer sentinel mixture models, we've created and published the WikiText language modeling dataset produced using over 28k Wikipedia articles.
Stephen Merity - September 26, 2016
We integrate promising recent developments in NMT, including subword splitting and back-translation for monolingual data augmentation, and introduce the Y-LSTM, a novel neural translation architecture.
James Bradbury - August 11, 2016
Neural network architectures with memory and attention mechanisms exhibit certain reasoning capabilities required for question answering.
Caiming Xiong - April 04, 2016
We published new state of the art results on a variety of natural language processing (NLP) tasks. Our model, which we call the Dynamic Memory Network (DMN), combines two lines of recent work on memory and attention mechanisms in deep learning.
Ankit Kumar - June 25, 2015