Accelerating the inference of large language models (LLMs) is a critical challenge in generative AI. Speculative decoding (SD) methods …
Nadav Timor, Jonathan Mamou, Daniel Korat, Moshe Berchansky, Oren Pereg, Gaurav Jain, Roy Schwartz, Moshe Wasserblat, David Harel
arXiv:2502.05202.