Publications

Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models

Text-to-Image (T2I) models often suffer from issues such as semantic leakage, incorrect feature binding, and omissions of key concepts …

Guy Kaplan, Michael Toker, Yuval Reif, Yonatan Belinkov, Roy Schwartz

arXiv:2504.01137.

PDF

On Pruning State-Space LLMs

Recent work proposed state-space models (SSMs) as an efficient alternative to transformer-based LLMs. Can these models be pruned to …

Tamer Ghattas, Michael Hassid, Roy Schwartz

arXiv:2502.18886.

PDF Code

Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies

Accelerating the inference of large language models (LLMs) is a critical challenge in generative AI. Speculative decoding (SD) methods …

Nadav Timor, Jonathan Mamou, Daniel Korat, Moshe Berchansky, Oren Pereg, Gaurav Jain, Roy Schwartz, Moshe Wasserblat, David Harel

arXiv:2502.05202.

PDF

From Tokens to Words: on the Inner Lexicon of LLMs

Natural language is composed of words, but modern LLMs process sub-words as input. A natural question raised by this discrepancy is …

Guy Kaplan, Matanel Oren, Yuval Reif, Roy Schwartz

In Proc. of ICLR 2025.

PDF Project

Attend First, Consolidate Later: On the Importance of Attention in Different LLM Layers

In decoder-based LLMs, the representation of a given layer serves two purposes: as input to the next layer during the computation of …

Amit Ben Artzy, Roy Schwartz

In Proc. of BlackboxNLP 2024.

PDF

What Can Natural Language Processing Do for Peer Review?

The number of scientific articles produced every year is growing rapidly. Providing quality control over them is crucial for scientists …

Ilia Kuznetsov, Osama Mohammed Afzal, Koen Dercksen, Nils Dycke, Alexander Goldberg, Tom Hope, Dirk Hovy, Jonathan K. Kummerfeld, Anne Lauscher, Kevin Leyton-Brown, Sheng Lu, Mausam, Margot Mieskes, Aurélie Névéol, Danish Pruthi, Lizhen Qu, Roy Schwartz, Noah A. Smith, Thamar Solorio, Jingyan Wang, Xiaodan Zhu, Anna Rogers, Nihar B. Shah, Iryna Gurevych

arXiv:2405.06563.

PDF

Accelerating Speculative Decoding using Dynamic Speculation Length

Speculative decoding is a promising method for reducing the inference latency of large language models. The effectiveness of the method …

Jonathan Mamou, Oren Pereg, Daniel Korat, Moshe Berchansky, Nadav Timor, Moshe Wasserblat, Roy Schwartz

In Proc. of ENLSP 2024.

PDF

The Larger the Better? Improved LLM Code-Generation via Budget Reallocation

It is a common belief that large language models (LLMs) are better than smaller-sized ones. However, larger models also require …

Michael Hassid*, Tal Remez*, Jonas Gehring, Roy Schwartz, Yossi Adi

In Proc. of COLM 2024.

PDF Project

Beyond Performance: Quantifying and Mitigating Label Bias in LLMs

Large language models (LLMs) have shown remarkable adaptability to diverse tasks, by leveraging context prompts containing …

Yuval Reif, Roy Schwartz

In Proc. of NAACL 2024.

PDF

Transformers are Multi-State RNNs

Transformers are considered conceptually different from the previous generation of state-of-the-art NLP models—recurrent neural …

Matanel Oren*, Michael Hassid*, Nir Yarden, Yossi Adi, Roy Schwartz

In Proc. of EMNLP 2024.

PDF Project

Textually Pretrained Speech Language Models

Speech language models (SpeechLMs) process and generate acoustic data only, without textual supervision. In this work, we propose …

Michael Hassid*, Tal Remez*, Tu Anh Nguyen, Itai Gat, Alexis Conneau, Felix Kreuk, Jade Copet, Alexandre Defossez, Gabriel Synnaeve, Emmanuel Dupoux, Roy Schwartz*, Yossi Adi*

In Proc. of NeurIPS 2023.

PDF Project

Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images

Weird, unusual, and uncanny images pique the curiosity of observers because they challenge commonsense. For example, an image released …

Nitzan Bitton Guetta*, Yonatan Bitton*, Jack Hassel, Ludwig Schmidt, Yuval Elovici, Gabi Stanovsky, Roy Schwartz

In Proc. of ICCV 2023.

PDF Project

Read, Look or Listen? What’s Needed for Solving a Multimodal Dataset

The prevalence of large-scale multimodal datasets presents unique challenges in assessing dataset quality. We propose a two-step method …

Netta Madvil, Yonatan Bitton, Roy Schwartz

arXiv:2307.04532.

PDF

Surveying (Dis)Parities and Concerns of Compute Hungry NLP Research

Many recent improvements in NLP stem from the development and use of large pre-trained language models (PLMs) with billions of …

Ji-Ung Lee, Haritz Puerto, Betty van Aken, Yuki Arase, Jessica Zosa Forde, Leon Derczynski, Andreas Rücklé, Iryna Gurevych, Roy Schwartz, Emma Strubell, Jesse Dodge

arXiv:2306.16900.

PDF

Finding the SWEET Spot: Analysis and Improvement of Adaptive Inference in Low Resource Settings

Adaptive inference is a simple method for reducing inference costs. The method works by maintaining multiple classifiers of different …

Daniel Rotem, Michael Hassid, Jonathan Mamou, Roy Schwartz

In Proc. of ACL 2023.

PDF Code

Fighting Bias with Bias: Promoting Model Robustness by Amplifying Dataset Biases

NLP models often rely on superficial cues known as dataset biases to achieve impressive performance, and can fail on examples where …

Yuval Reif, Roy Schwartz

In Findings of ACL 2023.

PDF Code

Curating Datasets for Better Performance with Example Training Dynamics

The landscape of NLP research is dominated by large-scale models training on colossal datasets, relying on data quantity rather than …

Aviad Sar-Shalom, Roy Schwartz

In Findings of ACL 2023.

PDF

Morphosyntactic Probing of Multilingual BERT Models

We introduce an extensive dataset for multilingual probing of morphological information in language models (247 tasks across 42 …

Judit Acs, Endre Hamerlik, Roy Schwartz, Noah A. Smith, Andras Kornai

In Natural Language Engineering 2023.

PDF Code

VASR: Visual Analogies of Situation Recognition

A core process in human cognition is analogical mapping: the ability to identify a similar relational structure between different …

Yonatan Bitton, Ron Yosef, Eli Strugo, Dafna Shahaf, Roy Schwartz, Gabi Stanovsky

In Proc. of AAAI 2023.

PDF Project

How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers

The attention mechanism is considered the backbone of the widely-used Transformer architecture. It contextualizes the input by …

Michael Hassid, Hao Peng, Daniel Rotem, Jungo Kasai, Ivan Montero, Noah A. Smith, Roy Schwartz

In Findings of EMNLP 2022.

PDF Code

Efficient Methods for Natural Language Processing: A Survey

Recent work in natural language processing (NLP) has yielded appealing results from scaling model parameters and training data; …

Marcos Treviso*, Tianchu Ji*, Ji-Ung Lee*, Betty van Aken, Qingqing Cao, Manuel R. Ciosici, Michael Hassid, Kenneth Heafield, Sara Hooker, Pedro H. Martins, André F. T. Martins, Peter Milder, Colin Raffel, Jessica Forde, Edwin Simpson, Noam Slonim, Jesse Dodge, Emma Strubell, Niranjan Balasubramanian, Leon Derczynski, Iryna Gurevych, Roy Schwartz

In TACL 2023.

PDF

WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models

While vision-and-language models perform well on tasks such as visual question answering, they struggle when it comes to basic human …

Yonatan Bitton*, Nitzan Bitton Guetta*, Ron Yosef, Yuval Elovici, Mohit Bansal, Gabi Stanovsky, Roy Schwartz

In Proc. of NeurIPS 2022 Track Datasets and Benchmarks track featured presentation.

PDF Project

Fewer Errors, but More Stereotypes? The Effect of Model Size on Gender Bias

The size of pretrained models is increasing, and so does their performance on a variety of NLP tasks. However, as their memorization …

Yarden Tal, Inbal Magar, Roy Schwartz

In Proc. of GeBNLP 2022.

PDF Code

TangoBERT: Reducing Inference Cost by using Cascaded Architecture

The remarkable success of large transformer-based models such as BERT, RoBERTa and XLNet in many NLP tasks comes with a large increase …

Jonathan Mamou, Oren Pereg, Moshe Wasserblat, Roy Schwartz

In Proc. of EMC² 2023.

PDF

On the Limitations of Dataset Balancing: The Lost Battle Against Spurious Correlations

Recent work has shown that deep learning models in NLP are highly sensitive to low-level correlations between simple features and …

Roy Schwartz, Gabi Stanovsky

In Findings of NAACL 2022.

PDF

Measuring the Carbon Intensity of AI in Cloud instances

The advent of cloud computing has provided people around the world with unprecedented access to computational power and enabled rapid …

Jesse Dodge, Taylor Prewitt, Remi Tachet des Combes, Erika Odmark, Roy Schwartz, Emma Strubell, Sasha Luccioni, Noah A. Smith, Nicole DeCario, Will Buchanan

In Proc. of FAccT 2022.

PDF Video

Data Contamination: From Memorization to Exploitation

Pretrained language models are typically trained on massive web-based datasets, which are often “contaminated” with downstream test …

Inbal Magar, Roy Schwartz

In Proc. of ACL 2022.

PDF Code

ABC: Attention with Bounded-memory Control

Transformer architectures have achieved stateof-the-art results on a variety of natural language processing (NLP) tasks. However, their …

Hao Peng, Jungo Kasai, Nikolaos Pappas, Dani Yogatama, Zhaofeng Wu, Lingpeng Kong, Roy Schwartz, Noah A. Smith

In Proc. of ACL 2022.

PDF

Expected Validation Performanceand Estimation of a Random Variable’s Maximum

NLP is often supported by experimental results, and improved reporting of such results can lead to better understanding and more …

Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Schwartz, Noah A. Smith

In Findings of EMNLP 2021.

PDF

Effects of Parameter Norm Growth During Transformer Training: Inductive Bias from Gradient Descent

The capacity of neural networks like the widely adopted transformer is known to be very high. Evidence is emerging that they learn …

Will Merrill, Vivek Ramanujan, Yoav Goldberg, Roy Schwartz, Noah A. Smith

In Proc. of EMNLP 2021.

PDF Code

Data Efficient Masked Language Modeling for Vision and Language

Masked language modeling (MLM) is one of the key sub-tasks in vision-language pretraining. In the cross-modal setting, tokens in the …

Yonatan Bitton, Gabi Stanovsky, Michael Elhadad, Roy Schwartz

In Findings of EMNLP 2021.

PDF Code Poster Video

Provable Limitations of Acquiring Meaning from Ungrounded Form: What will Future Language Models Understand?

Language models trained on billions of tokens have recently led to unprecedented results on many NLP tasks. This success raises the …

Will Merrill, Yoav Goldberg, Roy Schwartz, Noah A. Smith

In TACL 2021.

PDF

Extracting a Knowledge Base of Mechanisms from COVID-19 Papers

The urgency of mitigating COVID-19 has spawned a large and diverse body of scientific literature that is challenging for researchers to …

Tom Hope*, Aida Amini*, David Wadden, Madeleine van Zuylen, Sravanthi Parasa, Eric Horvitz, Dan Weld, Roy Schwartz, Hannaneh Hajishirzi

In Proc. of NAACL 2021.

PDF Dataset Project

Automatic Generation of Contrast Sets from Scene Graphs: Probing the Compositional Consistency of GQA

Recent works have shown that supervised models often exploit data artifacts to achieve good test scores while their performance …

Yonatan Bitton, Gabi Stanovsky, Roy Schwartz, Michael Elhadad

In Proc. of NAACL 2021.

PDF Code Poster Video

Random Feature Attention

Transformers are state-of-the-art models for a variety of sequence modeling tasks. At their core is an attention function which models …

Hao Peng, Nikolaos Pappas, Dani Yogatama, Roy Schwartz, Noah A. Smith, Lingpeng Kong

In Proc. of ICLR 2021 Spotlight presentation.

PDF Video

Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics

Large datasets have become commonplace in NLP research. However, the increased emphasis on data quantity has made it challenging to …

Swabha Swayamdipta, Roy Schwartz, Nicholas Lourie, Yizhong Wang, Hannaneh Hajishirzi, Noah A. Smith, Yejin Choi

In Proc. of EMNLP 2020.

PDF Code Video

Extracting a knowledge base of mechanisms from COVID-19 papers

The COVID-19 pandemic has sparked an influx of research by scientists worldwide, leading to a rapidly evolving corpus of …

Aida Amini*, Tom Hope*, David Wadden, Roy Schwartz, Hannaneh Hajishirzi

In Proc. of SciNLP 2020.

PDF Video

The Right Tool for the Job: Matching Model and Instance Complexities

As NLP models become larger, executing a trained model requires significant computational resources incurring monetary and …

Roy Schwartz, Gabi Stanovsky, Swabha Swayamdipta, Jesse Dodge, Noah A. Smith

In Proc. of ACL 2020.

PDF Code Slides Video

A Mixture of h-1 Heads is Better than h Heads

Multi-head attentive neural architectures have achieved state-of-the-art results on a variety of natural language processing tasks. …

Hao Peng, Roy Schwartz, Dianqi Li, Noah A. Smith

In Proc. of ACL 2020.

PDF Video

A Formal Hierarchy of RNN Architectures

We develop a formal hierarchy of the expressive capacity of RNN architectures. The hierarchy is based around two formal properties: …

Will Merrill, Gail Weiss, Yoav Goldberg, Roy Schwartz, Noah A. Smith, Eran Yahav

In Proc. of ACL 2020.

PDF Video

Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping

Fine-tuning pretrained contextual word embedding models to supervised downstream tasks has become commonplace in natural language …

Jesse Dodge, Gabriel Ilharco, Roy Schwartz, Ali Farhadi, Hannaneh Hajishirzi, Noah A. Smith

arXiv:2002:06305.

PDF

Show Your Work: Improved Reporting of Experimental Results

Research in natural language processing proceeds, in part, by demonstrating that new models achieve superior performance (e.g., …

Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Schwartz, Noah A. Smith

In Proc. of EMNLP 2019.

PDF Code

RNN Architecture Learning with Sparse Regularization

Neural models for NLP typically use large numbers of parameters to reach state-of-the- art performance, which can lead to excessive …

Jesse Dodge, Roy Schwartz, Hao Peng, Noah A. Smith

In Proc. of EMNLP 2019.

PDF Code

PaLM: A Hybrid Parser and Language Model

We present PaLM, a hybrid parser and neural language model. Building on an RNN language model, PaLM adds an attention layer over text …

Hao Peng, Roy Schwartz, Noah A. Smith

In Proc. of EMNLP 2019.

PDF Code

Knowledge Enhanced Contextual Word Representations

Contextual word representations, typically trained on unstructured, unlabeled text, do not contain any explicit grounding to real world …

Matthew E. Peters, Mark Neumann, Robert Logan, Roy Schwartz, Vidur Joshi, Sameer Singh, Noah A. Smith

In Proc. of EMNLP 2019.

PDF

Green AI

The computations required for deep learning research have been doubling every few months, resulting in an estimated 300,000x increase …

Roy Schwartz*, Jesse Dodge*, Noah A. Smith, Oren Etzioni

In CACM.

PDF Video

Inoculation by Fine-Tuning: A Method for Analyzing Challenge Datasets

Several datasets have recently been constructed to expose brittleness in models trained on existing benchmarks. While model performance …

Nelson F. Liu, Roy Schwartz, Noah A. Smith

In Proc. of NAACL 2019.

PDF Code Slides Video

SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference

Given a partial description like ‘she opened the hood of the car’, humans can reason about the situation and anticipate …

Rowan Zellers, Yonatan Bisk, Roy Schwartz, Yejin Choi

In Proc. of EMNLP 2018.

PDF Code Dataset Project Slides Video

Rational Recurrences

Despite the tremendous empirical success of neural models in natural language processing, many of them lack the strong intuitions that …

Hao Peng, Roy Schwartz, Sam Thomson, Noah A. Smith

In Proc. of EMNLP 2018.

PDF Code Slides Video

SoPa: Bridging CNNs, RNNs, and Weighted Finite-State Machines

Recurrent and convolutional neural networks comprise two distinct families of models that have proven to be useful for encoding natural …

Roy Schwartz*, Sam Thomson*, Noah A. Smith

In Proc. of ACL 2018.

PDF Code Poster

LSTMs Exploit Linguistic Attributes of Data

While recurrent neural networks have found success in a variety of natural language processing applications, they are general models of …

Nelson F. Liu, Omer Levy, Roy Schwartz, Chenhao Tan, Noah A. Smith

In Proc. of RepL4NLP 2018 Best paper award.

PDF Poster Slides

Annotation Artifacts in Natural Language Inference Data

Large-scale datasets for natural language inference are created by presenting crowd workers with a sentence (premise), and asking them …

Suchin Gururangan*, Swabha Swayamdipta*, Omer Levy, Roy Schwartz, Sam Bowman, Noah A. Smith

In Proc. of NAACL 2018.

PDF Poster

A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications

Peer reviewing is a central component in the scientific publishing process. We present the first public dataset of scientific peer …

Dongyeop Kang, Waleed Ammar, Bhavana Dalvi, Madeleine van Zuylen, Sebastian Kohlmeier, Eduard Hovy, Roy Schwartz

In Proc. of NAACL 2018.

PDF Dataset Poster

The Effect of Different Writing Tasks on Linguistic Style: A Case Study of the ROC Story Cloze Task

A writer’s style depends not just on personal traits but also on her intent and mental state. In this paper, we show how variants …

Roy Schwartz, Maarten Sap, Yannis Konstas, Li Zilles, Yejin Choi, Noah A. Smith

In Proc. of CoNLL 2017.

PDF Code Poster

Automatic selection of context configurations for improved (and fast) class-specific word representations

This paper is concerned with identifying contexts useful for training word representation models for different word classes such as …

Ivan Vulić, Roy Schwartz, Ari Rappoport, Roi Reichart, Anna Korhonen

In Proc. of CoNLL 2017.

PDF Slides

Story Cloze Task: UW NLP System

This paper describes University of Washington NLP’s submission for the Linking Models of Lexical, Sentential and Discourse-level …

Roy Schwartz, Maarten Sap, Yannis Konstas, Li Zilles, Yejin Choi, Noah A. Smith

In Proc. of LSDSem 2017 shared task Best performing system.

PDF Poster Slides

Symmetric Patterns and Coordinations: Fast and Enhanced Representations of Verbs and Adjectives

State-of-the-art word embeddings, which are often trained on bag-of-words (BOW) contexts, provide a high quality representation of …

Roy Schwartz, Roi Reichart, Ari Rappoport

In Proc. of NAACL 2016.

PDF Code Poster

Pattern-based methods for Improved Lexical Semantics and Word Embeddings

Roy Schwartz

PhD Thesis.

PDF

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction

We present a novel word level vector representation based on symmetric patterns (SPs). For this aim we automatically acquire SPs (e.g., …

Roy Schwartz, Roi Reichart, Ari Rappoport

In Proc. of CoNLL 2015.

PDF Code Slides

How Well Do Distributional Models Capture Different Types of Semantic Knowledge?

In recent years, distributional models (DMs) have shown great success in representing lexical semantics. In this work we show that the …

Dana Rubinstein, Effi Levi, Roy Schwartz, Ari Rappoport

In Proc. of ACL 2015.

PDF Poster

Minimally Supervised Classification to Semantic Categories using Automatically Acquired Symmetric Patterns

Classifying nouns into semantic categories (e.g., animals, food) is an important line of research in both cognitive science and natural …

Roy Schwartz, Roi Reichart, Ari Rappoport

In Proc. of COLING 2014.

PDF Slides

Authorship Attribution of Micro-Messages

Work on authorship attribution has traditionally focused on long texts. In this work, we tackle the question of whether the author of a …

Roy Schwartz, Oren Tsur, Ari Rappoport, Moshe Koppel

In Proc. of EMNLP 2013.

PDF Slides

Learnability-based Syntactic Annotation Design

There is often more than one way to represent syntactic structures, even within a given formalism. Selecting one representation over …

Roy Schwartz, Omri Abend, Ari Rappoport

In Proc. of COLING 2012.

PDF Code Slides

Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation

Dependency parsing is a central NLP task. In this paper we show that the common evaluation for unsupervised dependency parsing is …

Roy Schwartz, Omri Abend, Roi Reichart, Ari Rappoport

In Proc. of ACL 2011.

PDF Code Slides