SLP-RL

SLP Research Lab

The School of Computer Science and Engeenering
The Hebrew University of Jerusalem

The Spoken Language Processing Research Lab (SLP-RL), led by Dr. Yossi Adi at The Hebrew University of Jerusalem, is a dynamic hub for cutting-edge research. Our diverse research interests span spoken language modeling, automatic speech recognition, speech enhancement, speech and audio generation, and machine learning for audio, speech, and language processing.

Our lab collaborates with the global research community to advance speech technology through machine learning and deep learning toolks. Our goal is to create adaptive systems that enrich spoken human communication across various languages.

If you would like to pursue graduate studies with us (Master or PhD), please send your CV, research interests, and example code projects to the following address: yossi.adi@mail.huji.ac.il.

News

Members

Faculty

Yossi Adi

Ph.D. Students

Alon Ziv
Gallil Maimon
Or Tal (co-advising with Ami Wiesel)
Arnon Turetzky
(co-advising with Shmuel Peleg)
Michael Hassid
(co-advising with Roy Schwartz)

MS.c. Students

Ella Zeldes
Dor Tenenboim
Shoval Messica
Amit Roth
Guy Yariv

Textless NLP

In TextlessNLP, our goal is to build large language models that can directly process audio inputs, without accessing any textual supervision. Being able to achieve that would benefit languages that do not have large textual resources or standard orthography. It would also benefit high-resource languages where: (i) the oral and written forms often mismatch; (ii) linguistically relevant signals are absent from text (e.g., intonation)

Audio Research

The SLP Research Lab is actively involved in conducting research and advancing neural architectures to address various fundamental challenges in the domain of audio processing. This encompasses a wide range of tasks, including background noise removal, artificial bandwidth-extension (a.k.a audio super-resolution), audio compression, voice conversion, etc. For additional details, please refer to the publication section.

Hebrew Speech Technologies

Speech technology has made much progress over the past decade and has been integrated into many consumer products. Despite the progress, most of the models were developed for English and other high-resource languages. In this project, our goal is to build speech technologies (automatic-speech-recognition and text-to-speech) for the Hebrew language together with constructing a large scale Hebrew dataset.

Publications

Here is a list of selected publications from our group. For the full list of publications see my Google Scholar.

2024

  1. Or Tal*, Alon Ziv*, Itai Gat, Felix Kreuk, Yossi Adi. Joint Audio and Symbolic Conditioning for Temporally Controlled Text-to-Music Generation. The 25th International Society for Music Information Retrieval (ISMIR) Conference, (2024), [PDF].
  2. Simon Rouard, Jade Copet, Yossi Adi, Axel Roebel, Alexandre Defossez. Audio Conditioning for Music Generation via Discrete Bottleneck Features. The 25th International Society for Music Information Retrieval (ISMIR) Conference, (2024), [PDF].
  3. Shoval Messica, Yossi Adi. NAST: Noise Aware Speech Tokenization for Speech Language Models. The 25th Annual Conference of the International Speech Communication Association (Interspeech), 2024, [PDF].
  4. Xuankai Chang, Jiatong Shi, Shinji Watanabe, Yossi Adi, Xie Chen, Qin Jin. The Interspeech 2024 Challenge on Speech Processing Using Discrete Units. The 25th Annual Conference of the International Speech Communication Association (Interspeech), 2024, [PDF].
  5. Shiran Aziz, Yossi Adi, Shmuel Peleg. Audio Enhancement from Multiple Crowdsourced Recordings: A Simple and Effective Baseline. The 25th Annual Conference of the International Speech Communication Association (Interspeech), 2024, [PDF].
  6. Amit Roth, Arnon Turetzky, Yossi Adi. A Language Modeling Approach to Diacritic-Free Hebrew TTS. The 25th Annual Conference of the International Speech Communication Association (Interspeech), 2024, [PDF].
  7. Arnon Turetzky, Or Tal, Yael Segal-Feldman, Yehoshua Dissen, Ella Zeldes, Amit Roth, Eyal Cohen, Yosi Shrem, Bronya R. Chernyak, Olga Seleznova, Joseph Keshet, Yossi Adi. HebDB: a Weakly Supervised Dataset for Hebrew Speech Processing. The 25th Annual Conference of the International Speech Communication Association (Interspeech), 2024, [PDF].
  8. Jean-Marie Lemercier*, Simon Rouard*, Jade Copet, Yossi Adi, Alexandre Defossez. An Independence-promoting Loss for Music Generation with Language Models. The 41st International Conference on Machine Learning (ICML), 2024, [PDF].
  9. Alon Ziv, Itai Gat, Gael Le Lan, Tal Remez, Felix Kreuk, Alexandre Defossez, Jade Copet, Gabriel Synnaeve, Yossi Adi. Masked Audio Generation using a Single Non-Autoregressive Transformer. International Conference on Learning Representations (ICLR), 2024, [PDF, Code & Samples].
  10. Guy Yariv, Itai Gat, Sagie Benaim, Lior Wolf, Idan Schwartz*, Yossi Adi*. Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation. The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI), 2024, [PDF, Code & Samples].
  11. Guy Lorberbom*, Itai Gat*, Yossi Adi, Alex Schwing, Tamir Hazan. Layer Collaboration in the Forward-Forward Algorithm. The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI), 2024, [PDF].

2023

  1. Gallil Maimon, Yossi Adi. Speaking Style Conversion With Discrete Self-Supervised Units. Findings of Empirical Methods in Natural Language Processing (EMNLP), 2023, [PDF, Samples & Code].
  2. Robin Algayres, Yossi Adi, Tu Anh Nguyen, Jade Copet, Gabriel Synnaeve, Benoit Sagot, Emmanuel Dupoux. Generative Spoken Language Model Based on Continuous Word-sized Audio Tokens. Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023, [PDF].
  3. Michael Hassid, Tal Remez, Tu Anh Nguyen, Itai Gat, Alexis Conneau, Felix Kreuk, Jade Copet, Alexandre Defossez, Gabriel Synnaeve, Emmanuel Dupoux, Roy Schwartz, Yossi Adi. Textually Pretrained Speech Language Models. The 37th Annual Conference on Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA, 2023, [PDF, Samples].
  4. Robin San Roman, Yossi Adi, Antoine Deleforge, Romain Serizel, Gabriel Synnaeve, Alexandre Défossez. From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion. The 37th Annual Conference on Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA, 2023, [PDF, Code, Samples].
  5. Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi, Alexandre Défossez. Simple and Controllable Music Generation. The 37th Annual Conference on Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA, 2023, [PDF, Code, Samples, Demo].
  6. Guy Yariv, Itai Gat, Lior Wolf, Yossi Adi*, Idan Schwartz*. Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation. The 24th Annual Conference of the International Speech Communication Association (Interspeech), 2023, [PDF, Code, Demo, Samples].
  7. Wei-Ning Hsu, Tal Remez, Bowen Shi, Jacob Donley, Yossi Adi. ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement. The Conference on Computer Vision and Pattern Recognition (CVPR), 2023, [PDF, Samples].
  8. Moshe Mandel, Or Tal, Yossi Adi. AERO: Audio Super Resolution in the Spectral Domain. The 48th IEEE International Conference in Acoustic, Speech and Signal Processing (ICASSP), 2023, [PDF, Code, Samples].
  9. Roy Sheffer, Yossi Adi. I Hear Your True Colors: Image Guided Audio Generation. The 48th IEEE International Conference in Acoustic, Speech and Signal Processing (ICASSP), 2023, [PDF, Code, Samples].
  10. Amitay Sicherman, Yossi Adi. Analyzing Discrete Self Supervised Speech Representation For Spoken Language Modeling. The 48th IEEE International Conference in Acoustic, Speech and Signal Processing (ICASSP), 2023, [PDF, Code, Tool].
  11. Felix Kreuk, Gabriel Synnaeve, Adam Polyak, Uriel Singer, Alexandre Defossez, Jade Copet, Devi Parikh, Yaniv Taigman, Yossi Adi. AudioGen: Textually Guided Audio Generation. International Conference on Learning Representations (ICLR), 2023, [PDF, Samples].

2021-2022

  1. Itai Gat, Yossi Adi, Alexander Schwing, Tamir Hazan. On the Importance of Gradient Norm in PAC-Bayesian Bounds. The 36th Annual Conference on Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA, 2022, [PDF].
  2. Arnon Turetzky, Tzvi Michelson, Yossi Adi, Shmuel Peleg. Deep Audio Waveform Prior. The 23rd Annual Conference of the International Speech Communication Association (Interspeech), 2022, [PDF].
  3. Or Tal, Moshe Mandel, Felix Kreuk, Yossi Adi. A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement. The 23rd Annual Conference of the International Speech Communication Association (Interspeech), 2022, [PDF, Code].
  4. Shahaf Bassan, Yossi Adi, Jeffrey S. Rosenschein. Unsupervised Symbolic Music Segmentation using Ensemble Temporal Prediction Errors. The 23rd Annual Conference of the International Speech Communication Association (Interspeech), 2022, [PDF].
  5. Alon Berliner, Guy Rotman, Yossi Adi, Roi Reichart, Tamir Hazan. Learning Discrete Structured Variational Auto-Encoder Using Natural Evolution Strategies. International Conference on Learning Representations (ICLR), 2022, [PDF, Code].
  6. Efthymios Tzinis, Yossi Adi, Vamsi Krishna Ithapu, Buye Xu, Paris Smaragdis, Anurag Kumar. RemixIT: Continual Self-Training of Speech Enhancement Models via Bootstrapped Remixing. IEEE Journal of Selected Topics in Signal Processing, 2022, [PDF].
  7. Shahar Segal, Yossi Adi, Benny Pinkas, Carsten Baum, Chaya Ganesh, Joseph Keshet. Fairness in the Eyes of the Data: Certifying Machine-Learning Models. The Forth AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES), 2021, [PDF].

Open-source

At our laboratory, we are dedicated to open source and open research principles. All our source code and pre-trained models are publicly available, and we actively gather diverse spoken datasets in multiple languages for training and evaluating speech technologies. With embracing openness, our aim is to drive innovation and collaboration in the field of speech technology globally.

Code & Datasets

All source-code, pre-trained models and datasets are available under the SLP Research Lab GitHub page.