Learned in translation: contextualized word vectors

Most models in natural language processing (NLP) use pretrained word vectors to represent the meaning of individual words, but, because words rarely appear in isolation, all NLP models must learn how to understand words in context. Most learn to do so in isolation from other all other models. We teach a neural network how to contextualize words in the process of learning how to translate English to German. Then, we ask that neural network to provide context vectors (CoVe) to neural networks learning a second NLP task. In our experiments, networks trained with CoVe always achieve better performance.

Bryan McCann - July 31, 2017


Your tl;dr by an ai: a deep reinforced model for abstractive summarization

The last few decades have witnessed a fundamental change in the challenge of taking in new information. The bottleneck is no longer access to information; now it’s our ability to keep up. We all have to read more and more to keep up-to-date with our jobs, the news, and social media. We’ve looked at how AI can improve people’s work by helping with this information deluge and one potential answer is to have algorithms automatically summarize longer texts.

Romain Paulus - May 11, 2017


Learning when to skim and when to read

Do we always need human level accuracy in real world data? Or can we sometimes do with less? In this blog post we will explore how a fast baseline can decide which sentences are easy or difficult. By only using expensive classifiers on the difficult sentences we can save computational time.

Alexander Rosenberg Johansen - May 17, 2017


Knowing when to look: adaptive attention via a visual sentinel for image captioning

We introduce a novel adaptive attention encoder-decoder framework, a state of the art image-captioning deep learning model that significantly outperforms all existing systems on the COCO image captioning challenge data and Flickr30K.

Caiming Xiong - November 29, 2016


A way out of the odyssey: analyzing and combining recent insights for lstms

We propose and analyze a series of augmentations and modifications to LSTM networks resulting in improved performance for text classification datasets.

Shayne Longpre - November 16, 2016


Multiple different natural language processing tasks in a single deep model

We have developed a single deep neural network model which can learn five different natural language processing tasks. Our model achieves state-of-the-art results on syntactic chunking, dependency parsing, semantic relatedness, and textual entailment.

Kazuma Hashimoto - November 11, 2016


State of the art deep learning model for question answering

We introduce the Dynamic Coattention Network, a state of the art question answering deep learning model that significantly outperforms all existing systems on the Stanford Question Answering dataset.

Victor Zhong, Caiming Xiong - November 07, 2016


New neural network building block allows faster and more accurate text understanding

We published a new neural network building block, called a QRNN, that runs and trains much faster than traditional models due to better parallelism. Although our goal was speed, the new model also performs more accurately on every task we tried it on.

James Bradbury - November 07, 2016


Teaching neural networks to point to improve language modeling and translation

If you were a small child and wanted to ask what an object was but didn't have the vocabulary for it, how do you refer to it? By pointing at it! Surprisingly, neural networks can benefit from the same tactic as a five year old to improve on a variety of language tasks.

Stephen Merity - October 26, 2016


The wikitext long term dependency language modeling dataset

As part of our research into pointer sentinel mixture models, we've created and published the WikiText language modeling dataset produced using over 28k Wikipedia articles.

Stephen Merity - September 26, 2016


Metamind neural machine translation system for wmt 2016

We integrate promising recent developments in NMT, including subword splitting and back-translation for monolingual data augmentation, and introduce the Y-LSTM, a novel neural translation architecture.

James Bradbury - August 11, 2016


Dynamic memory networks for visual and textual question answering

Neural network architectures with memory and attention mechanisms exhibit certain reasoning capabilities required for question answering.

Caiming Xiong - April 04, 2016


New deep learning model understands and answers questions

We published new state of the art results on a variety of natural language processing (NLP) tasks. Our model, which we call the Dynamic Memory Network (DMN), combines two lines of recent work on memory and attention mechanisms in deep learning.

Ankit Kumar - June 25, 2015