A single channel speech enhancement technique exploiting human auditory masking properties

Nsabimana, F. X.; Subbaraman, V.; Zölzer, U.

doi:https://doi.org/10.5194/ars-8-95-2010

Articles | Volume 8

https://doi.org/10.5194/ars-8-95-2010

© Author(s) 2010. This work is distributed under
the Creative Commons Attribution 3.0 License.

https://doi.org/10.5194/ars-8-95-2010

© Author(s) 2010. This work is distributed under
the Creative Commons Attribution 3.0 License.

Articles | Volume 8

01 Oct 2010

| 01 Oct 2010

A single channel speech enhancement technique exploiting human auditory masking properties

F. X. Nsabimana, V. Subbaraman, and U. Zölzer

Abstract. To enhance extreme corrupted speech signals, an Improved Psychoacoustically Motivated Spectral Weighting Rule (IPMSWR) is proposed, that controls the predefined residual noise level by a time-frequency dependent parameter. Unlike conventional Psychoacoustically Motivated Spectral Weighting Rules (PMSWR), the level of the residual noise is here varied throughout the enhanced speech based on the discrimination between the regions with speech presence and speech absence by means of segmental SNR within critical bands. Controlling in such a way the level of the residual noise in the noise only region avoids the unpleasant residual noise perceived at very low SNRs. To derive the gain coefficients, the computation of the masking curve and the estimation of the corrupting noise power are required. Since the clean speech is generally not available for a single channel speech enhancement technique, the rough clean speech components needed to compute the masking curve are here obtained using advanced spectral subtraction techniques. To estimate the corrupting noise, a new technique is employed, that relies on the noise power estimation using rapid adaptation and recursive smoothing principles. The performances of the proposed approach are objectively and subjectively compared to the conventional approaches to highlight the aforementioned improvement.