Publications

The Right Tool for the Job: Matching Model and Instance Complexities

As NLP models become larger, executing a trained model requires significant computational resources incurring monetary and …

A Mixture of h-1 Heads is Better than h Heads

Multi-head attentive neural architectures have achieved state-of-the-art results on a variety of natural language processing tasks. …

A Formal Hierarchy of RNN Architectures

We develop a formal hierarchy of the expressive capacity of RNN architectures. The hierarchy is based around two formal properties: …

Extracting a knowledge base of mechanisms from COVID-19 papers

The COVID-19 pandemic has sparked an influx of research by scientists worldwide, leading to a rapidly evolving corpus of …

Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping

Fine-tuning pretrained contextual word embedding models to supervised downstream tasks has become commonplace in natural language …

Show Your Work: Improved Reporting of Experimental Results

Research in natural language processing proceeds, in part, by demonstrating that new models achieve superior performance (e.g., …

RNN Architecture Learning with Sparse Regularization

Neural models for NLP typically use large numbers of parameters to reach state-of-the- art performance, which can lead to excessive …

PaLM: A Hybrid Parser and Language Model

We present PaLM, a hybrid parser and neural language model. Building on an RNN language model, PaLM adds an attention layer over text …

Knowledge Enhanced Contextual Word Representations

Contextual word representations, typically trained on unstructured, unlabeled text, do not contain any explicit grounding to real world …

Inoculation by Fine-Tuning: A Method for Analyzing Challenge Datasets

Several datasets have recently been constructed to expose brittleness in models trained on existing benchmarks. While model performance …

Green AI

The computations required for deep learning research have been doubling every few months, resulting in an estimated 300,000x increase …

SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference

Given a partial description like ‘she opened the hood of the car’, humans can reason about the situation and anticipate …

SoPa: Bridging CNNs, RNNs, and Weighted Finite-State Machines

Recurrent and convolutional neural networks comprise two distinct families of models that have proven to be useful for encoding natural …

Rational Recurrences

Despite the tremendous empirical success of neural models in natural language processing, many of them lack the strong intuitions that …

LSTMs Exploit Linguistic Attributes of Data

While recurrent neural networks have found success in a variety of natural language processing applications, they are general models of …

Annotation Artifacts in Natural Language Inference Data

Large-scale datasets for natural language inference are created by presenting crowd workers with a sentence (premise), and asking them …

A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications

Peer reviewing is a central component in the scientific publishing process. We present the first public dataset of scientific peer …

The Effect of Different Writing Tasks on Linguistic Style: A Case Study of the ROC Story Cloze Task

A writer’s style depends not just on personal traits but also on her intent and mental state. In this paper, we show how variants …

Story Cloze Task: UW NLP System

This paper describes University of Washington NLP’s submission for the Linking Models of Lexical, Sentential and Discourse-level …

Automatic selection of context configurations for improved (and fast) class-specific word representations

This paper is concerned with identifying contexts useful for training word representation models for different word classes such as …

Symmetric Patterns and Coordinations: Fast and Enhanced Representations of Verbs and Adjectives

State-of-the-art word embeddings, which are often trained on bag-of-words (BOW) contexts, provide a high quality representation of …

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction

We present a novel word level vector representation based on symmetric patterns (SPs). For this aim we automatically acquire SPs (e.g., …

How Well Do Distributional Models Capture Different Types of Semantic Knowledge?

In recent years, distributional models (DMs) have shown great success in representing lexical semantics. In this work we show that the …

Minimally Supervised Classification to Semantic Categories using Automatically Acquired Symmetric Patterns

Classifying nouns into semantic categories (e.g., animals, food) is an important line of research in both cognitive science and natural …

Authorship Attribution of Micro-Messages

Work on authorship attribution has traditionally focused on long texts. In this work, we tackle the question of whether the author of a …

Learnability-based Syntactic Annotation Design

There is often more than one way to represent syntactic structures, even within a given formalism. Selecting one representation over …

Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation

Dependency parsing is a central NLP task. In this paper we show that the common evaluation for unsupervised dependency parsing is …