iSpeech 2023 is the first Israeli seminar on Speech & Audio processing using neural nets. It is a venue for presenting the most recent work on the science and technology of spoken language processing both in academia and industry. Our goal is to bring the research community together to exchange ideas, form collaborations, and present their latest research work.
This year, we will have two keynote speakers: Shinji Watanabe (CMU) and Gabriel Synnaeve (FAIR), together with additional talks and presentations from researchers and students from the local Israeli ecosystem.
We encourage researchers and students to submit their work for presentation at the conference. We will consider: (i) accepted papers from the past year (both journal and conference publications); (ii) a single-page (extended abstract) presenting preliminary results of promising research directions.
Notice, the conference will be live streamed under the following link.
Speaker: Shinji Watanabe (CMU).
Title: Explainable End-to-End Neural Networks for Spoken Language Processing.
Abstract: This presentation will showcase our group's efforts to integrate various spoken language processing modules into a single end-to-end neural network. Our focus will be on far-field conversation recognition, and we will demonstrate how we have successfully united automatic speech recognition, denoising, dereverberation, separation, and localization while maintaining explainability. By utilizing self-supervised learning, pre-training/fine-tuning strategies, and multi-task learning within our integrated network, we have achieved the best performance reported in the literature on several noisy reverberant speech recognition benchmarks, even reaching clean speech recognition performance. Additionally, we will provide other examples demonstrating the integration of automatic speech recognition with machine translation and natural language understanding for spoken language understanding and speech translation tasks. Our code and models are publicly available through the ESPnet toolkit, which can be accessed at https://github.com/espnet/espnet.
Speaker: Gabriel Synnaeve (FAIR).
Title: A journey into end-to-end models, from ASR to Music.