Yossi Adi

Yossi Adi

Yossi Adi

My name is Yossi Adi, I'm an Assistant Professor (Senior Lecturer) at the School of Computer Science and Engineering at the Hebrew University and a Research Scientist (part time) at the FAIR team, Meta. I'm also a professional drummer at Lucille Crew. I completed my Ph.D in computer science at Bar-Ilan University under the supervision of Prof. Joseph Keshet.

My research interests are focused on developing and analyzing machine learning and deep learning algorithms for speech and language applications. In general, my research usually follows the KISS philosophy. My complete CV can be found here.

As a musician, I'm the drummer and one of the band members of Lucille Crew, an international Groove, Hip Hop and Soul collective based in Tel Aviv.

yossi.adi at mail.huji.ac.il

Our research paper CAFA: a Controllable Automatic Foley Artist got accepted to ICCV 2025!
Together with Joseph Keshet I am organizing iSpeech-2025, the second Israeli seminar on Speech & Audio processing using neural nets.
Two research papers got accepted at INTERSPEECH 2025! More details in the publication section.
Our research paper Slamming: Training a Speech Language Model on One GPU in a Day got accepted to the Findings of ACL 2025!
Four research papers got accepted at ICASSP 2025! More details in the publication section.
Our research paper Discrete Flow Matching got accepted to NeurIPS 2024!
Our research paper Transformers are Multi-State RNNs got accepted to EMNLP 2024!
Our research paper The Larger the Better? Improved LLM Code-Generation via Budget Reallocation got accepted to CoLM 2024!
Two research papers got accepted at ISMIR 2024! More details in the publication section.
Five research papers got accepted at Interspeech 2024! More details in the publication section.
We are co-organizing a tutorial at Interspeech 2024 on Recent Advances in Speech Language Models!
We are co-organizing two special sessions at Interspeech 2024 on SpeechLMs and Discrete speech representation for speech processing!
Our research paper Masked Audio Generation using a Single Non-Autoregressive Transformer got accepted at ICLR 2024!
Two research papers got accepted at AAAI 2024! More details in the publication section.
I am grateful to be the recipient of the Alon Scholarship for Outstanding Faculty, 2023!
Together with Joseph Keshet and Tal Rosenwein I am organizing iSpeech-2023, the first Israeli seminar on Speech & Audio processing using neural nets.
I gave a talk about Generative Speech and Audio Modeling at the TLV AI Week.
Our research paper AudioGen: Textually Guided Audio Generation got accepted at ICLR 2023!
Our research paper Generative Spoken Dialogue Language Modeling got accepted at TACL!
I Joined the School of Computer Science and Engineering at the Hebrew University of Jerusalem as an Assistant Professor!
I am humbled to be part of TheMarker Magazine's 40 Under 40 list (in Herbrew): [The Marker].
An article about my music, AI research, and everything in between (in Hebrew): [Mako].
I am grateful to be the recipient of the Best Doctoral Dissertation Award in 2020 given by the Israeli Association for Artificial Intelligence (IAAI)!

Spoken Language Processing-Research Lab

I am currently leading the Spoken Language Processing Research Lab (SPL-RL) located at the Hebrew University of Jerusalem, Israel. The lab primarily focuses on exploring various areas within spoken language processing, such as automatic speech recognition, speech enhancement, spoken language understanding, and machine learning techniques specifically designed for audio, speech and language applications.

For more information on the lab members, activity, publications, etc. please visit the following link.

Publications

Conferences and Workshops Proceedings

Roi Benita*, Michael Finkelson, Tavi Halperin, Gleb Sterkin, Yossi Adi. CAFA: a Controllable Automatic Foley Artist. The IEEE/CVF International Conference on Computer Vision (ICCV), 2025, [PDF, Project page].
Iddo Yosha, Dorin Shteyman, Yossi Adi. WHISTRESS: Enriching Transcriptions with Sentence Stress Detection. The 26th Annual Conference of the International Speech Communication Association (Interspeech), 2025, [PDF, Project page].
Nadav Har-Tuv, Or Tal, Yossi Adi. PAST: Phonetic-Acoustic Speech Tokenizer. The 26th Annual Conference of the International Speech Communication Association (Interspeech), 2025, [PDF, Project page].
Gallil Maimon*, Avishai Elmakies*, Yossi Adi. Slamming: Training a Speech Language Model on One GPU in a Day. Findings of The 63rd Annual Meeting of the Association for Computational Linguistics (ACL), 2025, [PDF, Project page].
Guy Yariv, Yuval Kirstain, Amit Zohar, Shelly Sheynin, Yaniv Taigman, Yossi Adi, Sagie Benaim, Adam Polyak. Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation. The Conference on Computer Vision and Pattern Recognition (CVPR), 2025, [PDF, Project page].
Gallil Maimon*, Amit Roth*, Yossi Adi. A Suite for Acoustic Language Model Evaluation. The 50th IEEE International Conference in Acoustic, Speech and Signal Processing (ICASSP), 2025, [PDF, Code & Data].
Ella Zeldes, Or Tal, Yossi Adi. Enhancing TTS Stability in Hebrew using Discrete Semantic Units. The 50th IEEE International Conference in Acoustic, Speech and Signal Processing (ICASSP), 2025, [PDF, Code & Demo].
Robin San Roman, Pierre Fernandez, Antoine Deleforge, Yossi Adi, Romain Serizel. Latent Watermarking of Audio Generative Models. The 50th IEEE International Conference in Acoustic, Speech and Signal Processing (ICASSP), 2025, [PDF].
Simon Rouard*, Robin San Roman*, Yossi Adi, Axel Roebel. MusicGen-Stem: Multi-stem music generation and edition through autoregressive modeling. The 50th IEEE International Conference in Acoustic, Speech and Signal Processing (ICASSP), 2025, [PDF].
Itai Gat, Tal Remez, Neta Shaul, Felix Kreuk, Ricky T. Q. Chen, Gabriel Synnaeve, Yossi Adi, Yaron Lipman. Discrete Flow Matching. The 38th Annual Conference on Neural Information Processing Systems (NeurIPS), 2024, [PDF].
Matanel Oren*, Michael Hassid*, Yossi Adi, Roy Schwartz. Transformers are Multi-State RNNs. Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024, [PDF, Code].
Michael Hassid*, Tal Remez*, Jonas Gehring, Roy Schwartz, Yossi Adi. The Larger the Better? Improved LLM Code-Generation via Budget Reallocation. The 1st Conference on Language Modeling (CoLM), 2024, [PDF, Samples].
Or Tal*, Alon Ziv*, Itai Gat, Felix Kreuk, Yossi Adi. Joint Audio and Symbolic Conditioning for Temporally Controlled Text-to-Music Generation. The 25th International Society for Music Information Retrieval (ISMIR) Conference, (2024), [PDF].
Simon Rouard, Jade Copet, Yossi Adi, Axel Roebel, Alexandre Defossez. Audio Conditioning for Music Generation via Discrete Bottleneck Features. The 25th International Society for Music Information Retrieval (ISMIR) Conference, (2024), [PDF].
Shoval Messica, Yossi Adi. NAST: Noise Aware Speech Tokenization for Speech Language Models. The 25th Annual Conference of the International Speech Communication Association (Interspeech), 2024, [PDF, Code].
Xuankai Chang, Jiatong Shi, Shinji Watanabe, Yossi Adi, Xie Chen, Qin Jin. The Interspeech 2024 Challenge on Speech Processing Using Discrete Units. The 25th Annual Conference of the International Speech Communication Association (Interspeech), 2024, [PDF].
Shiran Aziz, Yossi Adi, Shmuel Peleg. Audio Enhancement from Multiple Crowdsourced Recordings: A Simple and Effective Baseline. The 25th Annual Conference of the International Speech Communication Association (Interspeech), 2024, [PDF].
Amit Roth, Arnon Turetzky, Yossi Adi. A Language Modeling Approach to Diacritic-Free Hebrew TTS. The 25th Annual Conference of the International Speech Communication Association (Interspeech), 2024, [PDF, Code].
Arnon Turetzky, Or Tal, Yael Segal-Feldman, Yehoshua Dissen, Ella Zeldes, Amit Roth, Eyal Cohen, Yosi Shrem, Bronya R. Chernyak, Olga Seleznova, Joseph Keshet, Yossi Adi. HebDB: a Weakly Supervised Dataset for Hebrew Speech Processing. The 25th Annual Conference of the International Speech Communication Association (Interspeech), 2024, [PDF, Data].
Jean-Marie Lemercier*, Simon Rouard*, Jade Copet, Yossi Adi, Alexandre Defossez. An Independence-promoting Loss for Music Generation with Language Models. The 41st International Conference on Machine Learning (ICML), 2024, [PDF].
Alon Ziv, Itai Gat, Gael Le Lan, Tal Remez, Felix Kreuk, Alexandre Defossez, Jade Copet, Gabriel Synnaeve, Yossi Adi. Masked Audio Generation using a Single Non-Autoregressive Transformer. International Conference on Learning Representations (ICLR), 2024, [PDF, Code & Samples].
Guy Yariv, Itai Gat, Sagie Benaim, Lior Wolf, Idan Schwartz*, Yossi Adi*. Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation. The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI), 2024, [PDF, Code & Samples].
Guy Lorberbom*, Itai Gat*, Yossi Adi, Alex Schwing, Tamir Hazan. Layer Collaboration in the Forward-Forward Algorithm. The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI), 2024, [PDF].
Po-chun Hsu, Ali Elkahky, Wei-Ning Hsu, Yossi Adi, Tu Anh Nguyen, Jade Copet, Emmanuel Dupoux, Hung-yi Lee, Abdelrahman Mohamed. Low-Resource Self-Supervised Learning with SSL-Enhanced TTS. Workshop on Speech Foundation Models and their Performance Benchmarks (SPARKS), 2023, [PDF].
Robin Algayres, Yossi Adi, Tu Anh Nguyen, Jade Copet, Gabriel Synnaeve, Benoit Sagot, Emmanuel Dupoux. Generative Spoken Language Model Based on Continuous Word-sized Audio Tokens. Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023, [PDF].
Gallil Maimon, Yossi Adi. Speaking Style Conversion With Discrete Self-Supervised Units. Findings of Empirical Methods in Natural Language Processing (EMNLP), 2023, [PDF, Samples & Code].
Michael Hassid, Tal Remez, Tu Anh Nguyen, Itai Gat, Alexis Conneau, Felix Kreuk, Jade Copet, Alexandre Defossez, Gabriel Synnaeve, Emmanuel Dupoux, Roy Schwartz, Yossi Adi. Textually Pretrained Speech Language Models. The 37th Annual Conference on Neural Information Processing Systems (NeurIPS), 2023, [PDF, Samples].
Robin San Roman, Yossi Adi, Antoine Deleforge, Romain Serizel, Gabriel Synnaeve, Alexandre Defossez. From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion. The 37th Annual Conference on Neural Information Processing Systems (NeurIPS), 2023, [PDF, Code, Samples].
Matthew Le, Apoorv Vyas, Bowen Shi, Brian Karrer, Leda Sari, Rashel Moritz, Mary Williamson, Vimal Manohar, Yossi Adi, Jay Mahadeokar, Wei-Ning Hsu. Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale. The 37th Annual Conference on Neural Information Processing Systems (NeurIPS), 2023, [PDF, Samples].
Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi, Alexandre Defossez. Simple and Controllable Music Generation. The 37th Annual Conference on Neural Information Processing Systems (NeurIPS), 2023, [PDF, Code, Samples, Demo].
Itai Gat, Felix Kreuk, Ann Lee, Jade Copet, Gabriel Synnaeve, Emmanuel Dupoux, Yossi Adi. Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling. The 20th International Conference on Spoken Language Translation, 2023, [PDF].
Guy Yariv, Itai Gat, Lior Wolf, Yossi Adi*, Idan Schwartz*. Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation. The 24th Annual Conference of the International Speech Communication Association (Interspeech), 2023, [PDF, Code, Demo, Samples].
Tu Anh Nguyen, Wei-Ning Hsu, Antony d'Avirro, Bowen Shi, Itai Gat, Maryam Fazel-Zarandi, Tal Remez, Jade Copet, Gabriel Synnaeve, Michael Hassid, Felix Kreuk, Yossi Adi*, Emmanuel Dupoux*. Expresso: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis. The 24th Annual Conference of the International Speech Communication Association (Interspeech), 2023, [PDF, Dataset].
Wei-Ning Hsu, Tal Remez, Bowen Shi, Jacob Donley, Yossi Adi. ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement. The Conference on Computer Vision and Pattern Recognition (CVPR), 2023, [PDF, Samples].
Moshe Mandel, Or Tal, Yossi Adi. AERO: Audio Super Resolution in the Spectral Domain. The 48th IEEE International Conference in Acoustic, Speech and Signal Processing (ICASSP), 2023, [PDF, Code, Samples].
Roy Sheffer, Yossi Adi. I Hear Your True Colors: Image Guided Audio Generation. The 48th IEEE International Conference in Acoustic, Speech and Signal Processing (ICASSP), 2023, [PDF, Code, Samples].
Amitay Sicherman, Yossi Adi. Analyzing Discrete Self Supervised Speech Representation For Spoken Language Modeling. The 48th IEEE International Conference in Acoustic, Speech and Signal Processing (ICASSP), 2023, [PDF, Code, Tool].
Wen-Chin Huang, Benjamin Peloquin, Justine Kao, Changhan Wang, Hongyu Gong, Elizabeth Salesky, Yossi Adi, Ann Lee, Peng-Jen Chen. A Holistic Cascade System, Benchmark, and Human Evaluation Protocol For Expressive Speech-To-Speech Translation. The 48th IEEE International Conference in Acoustic, Speech and Signal Processing (ICASSP), 2023, [PDF, Demo].
Ali Elkahky, Wei-Ning Hsu, Paden Tomasello, Tu-Anh Nguyen, Robin Algayres, Yossi Adi, Jade Copet, Emmanuel Dupoux, Abdelrahman Mohamed. Do Coarser Units Benefit Cluster Prediction-Based Speech Pre-Training?. The 48th IEEE International Conference in Acoustic, Speech and Signal Processing (ICASSP), 2023, [PDF].
Felix Kreuk, Gabriel Synnaeve, Adam Polyak, Uriel Singer, Alexandre Defossez, Jade Copet, Devi Parikh, Yaniv Taigman, Yossi Adi. AudioGen: Textually Guided Audio Generation. International Conference on Learning Representations (ICLR), 2023, [PDF, Samples].
Felix Kreuk, Adam Polyak, Jade Copet, Eugene Kharitonov, Tu-Anh Nguyen, Morgane Riviere, Wei-Ning Hsu, Abdelrahman Mohamed Emmanuel Dupoux, Yossi Adi. Textless Speech Emotion Conversion using Decomposed and Discrete Representations. Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022, [PDF, Samples, Blog].
Itai Gat, Yossi Adi, Alexander Schwing, Tamir Hazan. On the Importance of Gradient Norm in PAC-Bayesian Bounds. The 36th Annual Conference on Neural Information Processing Systems (NeurIPS), 2022, [PDF].
Paden Tomasello, Akshat Shrivastava, Daniel Lazar, Po-Chun Hsu, Duc Le, Adithya Sagar, Ali Elkahky, Jade Copet, Wei-Ning Hsu, Yossi Adi, Robin Algayres, Tu Ahn Nguyen, Emmanuel Dupoux, Luke Zettlemoyer, Abdelrahman Mohamed. STOP: A dataset for Spoken Task Oriented Semantic Parsing. IEEE Spoken Language Technology Workshop (SLT), 2022, [PDF].
Arnon Turetzky, Tzvi Michelson, Yossi Adi, Shmuel Peleg. Deep Audio Waveform Prior. The 23rd Annual Conference of the International Speech Communication Association (Interspeech), 2022, [PDF].
Or Tal, Moshe Mandel, Felix Kreuk, Yossi Adi. A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement. The 23rd Annual Conference of the International Speech Communication Association (Interspeech), 2022, [PDF, Code].
Shahaf Bassan, Yossi Adi, Jeffrey S. Rosenschein. Unsupervised Symbolic Music Segmentation using Ensemble Temporal Prediction Errors. The 23rd Annual Conference of the International Speech Communication Association (Interspeech), 2022, [PDF].
Sravya Popuri, Peng-Jen Chen, Changhan Wang, Juan Pino, Yossi Adi, Jiatao Gu, Wei-Ning Hsu, Ann Lee. Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation. The 23rd Annual Conference of the International Speech Communication Association (Interspeech), 2022, [PDF].
Maureen de Seyssel, Marvin Lavechin, Yossi Adi, Emmanuel Dupoux, Guillaume Wisniewski. Probing Phoneme, Language and Speaker Information in Unsupervised Speech Representations. The 23rd Annual Conference of the International Speech Communication Association (Interspeech), 2022, [PDF].
Ann Lee, Hongyu Gong, Paul-Ambroise Duquenne, Holger Schwenk, Peng-Jen Chen, Changhan Wang, Sravya Popuri, Yossi Adi, Juan Pino, Jiatao Gu, Wei-Ning Hsu. Textless Speech-to-Speech Translation on Real Data. Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2022, [PDF, Code, Samples].
Eugene Kharitonov, Jade Copet, Kushal Lakhotia, Tu-Anh Nguyen, Paden Tomasello, Ann Lee, Ali Elkahky, Wei-Ning Hsu, Abdelrahman Mohamed Emmanuel Dupoux, Yossi Adi. textless-lib: a Library for Textless Spoken Language Processing. Conference of the North American Chapter of the Association for Computational Linguistics (NAACL): System Demonstrations, 2022, [PDF, Code].
Eugene Kharitonov*, Ann Lee*, Adam Polyak, Yossi Adi, Jade Copet, Kushal Lakhotia, Tu-Anh Nguyen, Morgane Riviere, Abdelrahman Mohamed, Emmanuel Dupoux, Wei-Ning Hsu. Text-Free Prosody-Aware Generative Spoken Language Modeling. The 60th Annual Meeting of the Association for Computational Linguistics (ACL), 2022, [PDF].
Ann Lee, Peng-Jen Chen, Changhan Wang, Jiatao Gu, Xutai Ma, Adam Polyak, Yossi Adi, Qing He, Yun Tang, Juan Pino, Wei-Ning Hsu. Direct Speech-to-Speech Translation with Discrete Units. The 60th Annual Meeting of the Association for Computational Linguistics (ACL), 2022, [PDF].
Alon Berliner, Guy Rotman, Yossi Adi, Roi Reichart, Tamir Hazan. Learning Discrete Structured Variational Auto-Encoder Using Natural Evolution Strategies. International Conference on Learning Representations (ICLR), 2022, [PDF, Code].
Efthymios Tzinis, Yossi Adi, Vamsi K. Ithapu, Buye Xu, Anurag Kumar. Continual Self-Training with Bootstrapped Remixing For Speech Enhancement. The 47th IEEE International Conference in Acoustic, Speech and Signal Processing (ICASSP), 2022, [PDF].
Changhan Wang*, Wei-Ning Hsu*, Yossi Adi, Adam Polyak, Ann Lee, Peng-Jen Chen, Jiatao Gu, Juan Pino. FAIRSEQ S^2: A Scalable and Integrable Speech Synthesis Toolkit. Conference on Empirical Methods in Natural Language Processing (EMNLP): System Demonstrations, 2021, [PDF, Code].
*Ori Kabeli, *Yossi Adi, Zhenyu Tang, Buye Xu, Anurag Kumar. Online Self-Attentive Gated RNNs for Real-Time Speaker Separation. Proceedings of Machine Learning in Speech and Language Processing Workshop, 2021, [PDF, Samples].
Adam Polyak, Yossi Adi, Jade Copet, Eugene Kharitonov, Kushal Lakhotia, Wei-Ning Hsu, Adelrahman Mohamed, Emmanuel Dupoux. Speech Resynthesis from Discrete Disentangled Self-Supervised Representations. The 22nd Annual Conference of the International Speech Communication Association (Interspeech), 2021, [PDF, Samples, Code].
Shahar Segal, Yossi Adi, Benny Pinkas, Carsten Baum, Chaya Ganesh, Joseph Keshet. Fairness in the Eyes of the Data: Certifying Machine-Learning Models. The Forth AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES), 2021, [PDF].
Adam Polyak, Lior Wolf, Yossi Adi, Ori Kabeli, Yaniv Taigman. High Fidelity Speech Regeneration with Application to Speech Enhancement. The 46th IEEE International Conference in Acoustic, Speech and Signal Processing (ICASSP), 2021, [PDF, Samples].
Shlomo E. Chazan, Lior Wolf, Eliya Nachmani, Yossi Adi. Single Channel Voice Separation for Unknown Number of Speakers Under Reverberant and Noisy Settings. The 46th IEEE International Conference in Acoustic, Speech and Signal Processing (ICASSP), 2021, [PDF, Samples].
Felix Kreuk, Joseph Keshet, Yossi Adi. Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation. The 21st Annual Conference of the International Speech Communication Association (Interspeech), 2020, [PDF, Code].
Adam Polyak, Lior Wolf, Yossi Adi, Yaniv Taigman. Unsupervised Cross-Domain Singing Voice Conversion. The 21st Annual Conference of the International Speech Communication Association (Interspeech), 2020, [PDF, Samples].
Alexandre Defossez, Gabriel Synnaeve, Yossi Adi. Real Time Speech Enhancement in the Waveform Domain. The 21st Annual Conference of the International Speech Communication Association (Interspeech), 2020, [PDF, Samples, Code].
Felix Kreuk, Yossi Adi, Bhiksha Raj, Rita Singh, Joseph Keshet. Hide and Speak: Towards Deep Neural Networks for Speech Steganography. The 21st Annual Conference of the International Speech Communication Association (Interspeech), 2020, [PDF, Samples, Code].
Eliya Nachmani, Yossi Adi, Lior Wolf. Voice Separation with an Unknown Number of Multiple Speakers. The 37th International Conference on Machine Learning (ICML), 2020, [PDF, Samples, Code, Blog].
Ben Goldberger, Yossi Adi, Joseph Keshet, Guy Katz. Minimal Modifications of Deep Neural Networks using Verification. The 23rd International Conference on Logic for Programming, Artificial Intelligence and Reasoning (LPAR), 2020, [PDF, Code].
Felix Kreuk, Yaniv Sheena, Joseph Keshet, Yossi Adi. Phoneme Boundary Detection using Learnable Segmental Features. The 45th IEEE International Conference in Acoustic, Speech and Signal Processing (ICASSP), 2020, [PDF, Code].
Yossi Adi, Neil Zeghidour, Ronan Collobert, Nicolas Usunier, Vitaliy Liptchinsky, Gabriel Synnaeve. To Reverse The Gradient or Not: An Empirical Comparison of Adversarial and Multi-Task Learning in Speech Recognition. The 44th IEEE International Conference in Acoustic, Speech and Signal Processing (ICASSP), 2019, [PDF].
Gabi Shalev, Yossi Adi, Joseph Keshet. Out-of-Distribution Detection using Multiple Semantic Label Representations. The 32nd Annual Conference on Neural Information Processing Systems (NeurIPS), 2018, [PDF].
Yossi Adi, Carsten Baum, Moustapha Cisse, Benny Pinkas, Joseph Keshet. Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring. Usenix Security, 2018, [PDF, Blog].
Felix Kreuk, Yossi Adi, Moustapha Cisse, Joseph Keshet. Fooling End-to-End Speaker Verification With Adversarial Examples. The 43nd IEEE International Conference in Acoustic, Speech and Signal Processing (ICASSP), 2018, [PDF].
Moustapha Cisse, Yossi Adi, Natalia Neverova, Joseph Keshet. Houdini: Fooling Deep Structured Visual and Speech Recognition Models with Adversarial Examples. The 31st Annual Conference on Neural Information Processing Systems (NeurIPS), 2017, [PDF].
Einat Naaman, Yossi Adi, Joseph Keshet. Learning Similarity Function for Pronunciation Variations. The 18th Annual Conference of the International Speech Communication Association (Interspeech), 2017, [PDF].
Yaniv Sheena, Misha Hejna, Yossi Adi, Joseph Keshet. Automatic Measurement of Pre-aspiration. The 18th Annual Conference of the International Speech Communication Association (Interspeech), 2017, [PDF].
Yossi Adi, Einat Kermany, Yonatan Belinkov, Ofer Lavi and Yoav Goldberg. Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks. International Conference on Learning Representations (ICLR), 2017, [PDF].
Yossi Adi, Joseph Keshet, Emily Cibelli, and Matt Goldrick. Sequence Segmentation Using Joint RNN and Structured Prediction Models. The 42nd IEEE International Conference in Acoustic, Speech and Signal Processing (ICASSP), 2017, [PDF].
Yossi Adi, Joseph Keshet, Olga Dmitrieva and Matt Goldrick. Automatic Measurement of Voice Onset Time and Prevoicing using Recurrent Neural Networks. The 17th Annual Conference of the International Speech Communication Association (Interspeech), 2016, [PDF, Code].
Yossi Adi, Joseph Keshet and Matt Goldrick. Vowel Duration Measurement Using Deep Neural Networks. The 25th IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2015, [PDF].

Journals

Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli. Scaling Speech Technology to 1,000+ Languages. Journal of Machine Learning Research, 2024, [PDF, Blog].
Alexandre Defossez*, Jade Copet*, Gabriel Synnaeve^, Yossi Adi^. High Fidelity Neural Audio Compression. Transactions on Machine Learning Research (TMLR) [Featured, Reproducibility], 2023, [PDF, Code].
Tu Anh Nguyen, Eugene Kharitonov, Jade Copet, Yossi Adi, Wei-Ning Hsu, Ali Elkahky, Paden Tomasello, Robin Algayres, Benoit Sagot, Abdelrahman Mohamed, Emmanuel Dupoux. Generative Spoken Dialogue Language Modeling. Transactions of the Association for Computational Linguistics (TACL), 2023, [PDF, Samples].
*Alexandre Defossez, *Yossi Adi, Gabriel Synnaeve. Differentiable Model Compression via Pseudo Quantization Noise. Transactions on Machine Learning Research (TMLR), 2022, [PDF, Code].
Efthymios Tzinis, Yossi Adi, Vamsi Krishna Ithapu, Buye Xu, Paris Smaragdis, Anurag Kumar. RemixIT: Continual Self-Training of Speech Enhancement Models via Bootstrapped Remixing. IEEE Journal of Selected Topics in Signal Processing, 2022, [PDF].
Kushal Lakhotia*, Eugene Kharitonov*, Wei-Ning Hsu, Yossi Adi, Adam Polyak, Benjamin Bolte, Tu-Anh Nguyen, Jade Copet, Alexei Baevski, Michael Auli, Alexis Conneau, Adelrahman Mohamed, Emmanuel Dupoux. On Generative Spoken Language Modeling from Raw Audio. Transactions of the Association for Computational Linguistics (TACL), 2021, [PDF, Samples, Code, Blog].
Ke Tan, Buye Xu, Anurag Kumar, Eliya Nachmani, Yossi Adi. SAGRNN: Self-Attentive Gated RNN for Binaural Speaker Separation with Interaural Cue Preservation. IEEE Signal Processing Letters, 2020, [PDF, Samples].
Jacob T. Cohen, Alma Cohen, Limor Benyamini, Yossi Adi, Joseph Keshet. Predicting Glottal Closure Insufficiency using Fundamental Frequency Contour Analysis. Head & Neck, Journal of the Sciences and Specialities of the Head and Neck, 2019, [PDF].
Matthew Goldrick, Rhonda McClain, Emily Cibelli, Yossi Adi, Erin Gustafson, Cornelia Moers, and Joseph Keshet. The Influence of Lexical Selection Disruptions on Articulation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 2018, [PDF].
Yossi Adi, Einat Kermany, Yonatan Belinkov, Ofer Lavi and Yoav Goldberg. Analysis of Sentence Embedding Models using Prediction Tasks in Natural Language Processing. IBM Journal of Research and Development, 2017, [PDF].
Yossi Adi, Joseph Keshet, Emily Cibelli, Erin Gustafson, Cynthia Clopper and Matt Goldrick. Automatic Measurement of Vowel Duration via Structured Prediction. Journal of the Acoustical Society of America, 2016, [PDF, Code].
Yossi Adi and Joseph Keshet. StructED : Risk Minimization in Structured Prediction. Journal of Machine Learning Research, 2016, [PDF, Website, Code].

Preprints

Pooneh Mousavi, Gallil Maimon*, Adel Moumen*, Darius Petermann*, Jiatong Shi*, Haibin Wu*, Haici Yang*, Anastasia Kuznetsova*, Artem Ploujnikov, Ricard Marxer, Bhuvana Ramabhadran, Benjamin Elizalde, Loren Lugosch, Jinyu Li, Cem Subakan, Phil Woodland, Minje Kim, Hung-yi Lee, Shinji Watanabe, Yossi Adi, Mirco Ravanelli. Discrete Audio Tokens: More Than a Survey!. arXiv preprint arXiv:2506.10274 (2025), [PDF, Project page].
Or Tal, Felix Kreuk, Yossi Adi. Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation. arXiv preprint arXiv:2506.08570 (2025), [PDF, Project page].
Iddo Yosha, Gallil Maimon, Yossi Adi. StressTest: Can YOUR Speech LM Handle the Stress?. arXiv preprint arXiv:2505.22765 (2025), [PDF, Project page].
Michael Hassid, Gabriel Synnaeve, Yossi Adi, Roy Schwartz. Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning. arXiv preprint arXiv:2505.17813 (2025), [PDF].
Siddhant Arora*, Kai-Wei Chang*, Chung-Ming Chien*, Yifan Peng*, Haibin Wu*, Yossi Adi+, Emmanuel Dupoux+, Hung-Yi Lee+, Karen Livescu+, Shinji Watanabe+. On The Landscape of Spoken Language Models: A Comprehensive Survey. arXiv preprint arXiv:2504.08528 (2025), [PDF].
Gallil Maimon, Michael Hassid, Amit Roth, Yossi Adi. Scaling Analysis of Interleaved Speech-Text Language Models. arXiv preprint arXiv:2504.02398 (2025), [PDF, Project page].
Avishai Elmakies, Omri Abend, Yossi Adi. Unsupervised Speech Segmentation: A General Approach Using Speech Language Models. arXiv preprint arXiv:2501.03711 (2025), [PDF, Code].
Arnon Turetzky, Yossi Adi. LAST: Language Model Aware Speech Tokenization. arXiv preprint arXiv:2409.03701 (2024), [PDF].
Guy Yariv, Idan Schwartz, Yossi Adi*, Sagie Benaim*. Improving Visual Commonsense in Language Models via Multiple Image Generation. arXiv preprint arXiv:2406.13621 (2024), [PDF, Website, Code].
Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jeremy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Defossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve. Code Llama: Open Foundation Models for Code. arXiv preprint arXiv:2308.12950 (2023), [PDF].
Felix Kreuk, Yaniv Taigman, Adam Polyak, Jade Copet, Gabriel Synnaeve, Alexandre Defossez, Yossi Adi. Audio Language Modeling using Perceptually-Guided Discrete Representations. arXiv preprint arXiv:2211.01223 (2022), [PDF, Samples].
Yossi Adi, Yaniv Nemcovsky, Alex Schwing, Tamir Hazan. On The Generalization of Bayesian Deep Nets for Multi-class Classification. arXiv preprint arXiv:2002.09866 (2020), [PDF].

Teaching

2024-2025

2023-2024

2022-2023

Tips & Links

The purpose of this section is to provide useful links, tips, and advices to anyone doing research in ML / DL.

Research & PhD tips

10 tips for research and a phd, by Sebastian Ruder.
A survival guide to a PhD, by Andrej Karpathy.
An opinionated guide to ML research, by John Schulman.
PhD 101, by Volkan Cirik.
Some great advices for research students , by Jason Eisner.
Whan should I apply for grad school?, by Jason Eisner.
Bertran Russell’s ten commandments, borrowed from Yonatan Belinkov website.

Writing tips

A short guide to typesetting math in NLP papers, by Chris Dyer, Kevin Gimpel, and Noah Smith.
Three styles for writing a paper, by Stuart Shieber.
Writing acvices, by Philip Resnik.
How to write a great research paper, by Simon Peyton Jones.
How to write a great research proposal, by Simon Peyton Jones.
Stylistic advices for scientific writing, by Jordan Boyd-Graber.

Other usefull tips

How to give a great research talk, by Simon Peyton Jones.
Reviewer Guide, from ICLR 2021.
How to write good rebuttals, from Devi Parikh, Dhruv Batra, and Stefan Lee.

From the press

Our model MusicGen, which generates music from textual descriptions got attention from the media: [TechCrunch].
Our model AudioGen, which generates audio from textual descriptions got attention from the media: [New Scientist].
An article about my research on speech enhancement and source separation was published in The Marker magazine: [article].
I am humbled to be part of TheMarker Magazine's 40 Under 40 list (in Herbrew): [The Marker].
An article about my music, AI research, and everything in between (in Hebrew): [Mako].
Our work on single channel speaker source separation using deep nets got some attention from the media: [Globes (Hebrew), Venture Beat, Digital Trends].
A medium blog post about our joint work with Carsten Baum, Moustapha Cissé, Benny Pinkas and Joseph Keshet about watermarking deep neural networks: [Medium].
Our joint work with Moustapha Cissé, Natalia Neverova, and Joseph Keshet about fooling structured deep learning models got attention from the media: [New Scientist, MIT Technology Review].

I gave a talk about Generative Speech and Audio Modeling at the TLV AI Week.
I had a pleasure to chat about my research at the F2F podcast (in Hebrew): [podcast].
I was honored to be a keynote speaker at the International Conference on Artificial Intelligence - Academy and Industry.
I gave a talk at the TECHINOVATiON, The Marker conference [Talk].

Music

Spotify

Apple Music

Deezer

YouTube

SoundCloud

Bandcamp

Contact

School of Computer Science and Engineering.

The Hebrew University, Jerusalem, Israel, Edmond Safra Campus, Givat Ram.

Rothberg Building C, Room C432.

yossi.adi at mail.huji.ac.il.