Professor of Natural Language Processing

The Hebrew University of Jerusalem

Schwartz lab

Roy Schwartz's lab at the School of Computer Science and Engineering at the The Hebrew University of Jerusalem studies Natural Language Processing (NLP). Our research is driven towards making text understanding technology widely accessible—to doctors, to teachers, to researchers or even to curious teenagers. To be broadly adopted, NLP technology needs to not only be accurate, but also reliable; models should provide explanations for their outputs; and the methods we use to evaluate them need to be convincing.
Our lab also studies methods to make NLP technology more efficient and green, in order to decrease the environmental impact of the field, as well as lower the cost of AI research in order to broaden participation in it.

Lab News

Can you leverage pre-trained text models for better speech models? Check out Michael’s new paper!

Three papers accepted to ACL (1 main, 2 findings)

Congrats to Daniel, Yuval, Aviad and Michael!

Our efficient NLP survey paper was accepted to TACL!

Proud to be part of this wondeful and diverse group of researchers!

Congrats to Boaz, Michael, Aviad, Yuval, and Daniel for defending their Master’s theses!

An awesome lab event in Nahal Halilim!

Thank you all attendees of the Efficient NLP workshop in Dagstuhl!

See workshop report for more details!

Our efficient NLP policy has been officially adopted by the ACL exec!

See document link for more details.


Biases in Datasets

We analyze the datasets on which NLP models are trained. Looking carefully into these datasets, we uncover limitations and biases in the data collection process as well as the evaluation process. Our findings indicate that the recent success of neural models on many NLP tasks has been overestimated, and pave the way for the development of more reliable methods of evaluation.

Green NLP

The computations required for deep learning research have been doubling every few months. These computations have a surprisingly large carbon footprint. Moreover, the financial cost of the computations can make it difficult for academics, students, and researchers, in particular those from emerging economies, to engage in deep learning research. Our lab studies tools to make NLP technology more efficient, and to enhance the reporting of computational budgets.


Humans learn about the world using input from multiple modalities. Machines can also leverage other modalities in order to improve their textual understanding. Our lab studies methods for combining textual information with data from images, sounds, videos and others, with the goal of making them more robust and allowing them to generalize better.

Understanding NLP

In recent years, deep learning became the leading machine learning technology in NLP. Despite its wide adoption in NLP, the theory of deep learning lags behind its empirical success, as many engineered systems are in commercial use without a solid scientific basis for their operation. Our research aims to bridge the gap between theory and practice. We devise mathematical theories that link deep neural models to classical NLP models, such as weighted finite-state automata.

Recent Publications

Quickly discover relevant content by filtering publications.

Textually Pretrained Speech Language Models

Speech language models (SpeechLMs) process and generate acoustic data only, without textual supervision. In this work, we propose …

Finding the SWEET Spot: Analysis and Improvement of Adaptive Inference in Low Resource Settings

Adaptive inference is a simple method for reducing inference costs. The method works by maintaining multiple classifiers of different …

Fighting Bias with Bias: Promoting Model Robustness by Amplifying Dataset Biases

NLP models often rely on superficial cues known as dataset biases to achieve impressive performance, and can fail on examples where …

Curating Datasets for Better Performance with Example Training Dynamics

The landscape of NLP research is dominated by large-scale models training on colossal datasets, relying on data quantity rather than …

Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images

Weird, unusual, and uncanny images pique the curiosity of observers because they challenge commonsense. For example, an image released …


  • School of Computer Science and Engineering, Edmond Safra Campus, Givat Ram, The Hebrew University, Jerusalem, 9190401
  • Rothberg Building C, Room C503