Thursday, December 29, 2005

More Papers(2)

Here are the other 2 papers that I read over this week.

Suppression of Acoustic Noise Signals Using Spectral Subtraction (S F Boll): This is truly a great paper written in 1979. The method used in this paper is the method that my advisor does subtracting the noise in the signals. What this paper suggests in suppressing the signal is take a part of the non-speech signal and create a signal that has the same statistical charachteristics over the full length of the speech signal. Now of course there are issues with statistical assumptions. First one is that the noise is assumed to be locally stationary - meaning that it's statistical properties are the same during the speech and non-speech segments. If there is a change, then the noise spectrum has to be recalculated. Basically the procedure is take the filtered and digitized speech, window it as half overlapped data buffers (hanning window is used since it's half overlapped the signal can be perfectly reconstructed), magnitude spectra of the windowed data are calculated, spectral noise bias during non-speech activity is subtracted, resulting negative amplitudes are zeroed out, secondary noise suppression done(explained in the next paragraph) , time-waveform is formed and it's overlap added to previous data.

The secondary noise suppression methods are used to suppress the error in noise predicted to the real noise. Spectral subtraction process is actually applying a filter which has a frequency responce of [1-(predicte_noise_freq_response / freq_response_of_speech)]. The methods are:
1) time averaging the freq_response of the speech source over a period of time where the speech is assumed to be stationary.
2) half-wave rectification which is adding the magnitude of the filter to the filter itself and dividing it by 2. What this does is for the frequencies where the noisy signal magnitude is less then the predicted noise magnitude, the noisy signal magnitude is changed to zero. The advantage of doing this is the noise floor is reduced by the freq response of the noise. The disadvantage is the cases where the noise and the speech magnitude is less than the predicted noise magnitude are basically lost.
3) Residual noise reduction - after half wafe rectification for the frequencies where the estimated noise magnitude is less than the real noise magnitude at the non-speech activity, you look at adjacent frames and replace the value with the least one.
4) Additional Signal Attenuation during Non-Speech Activity: If for a certain frame over all frequencies the ratio of the subtracted signal over the estimated noise is less than -12db, that frame is considered as no-speech activity. In those frames what you do is just attenuate the signal 30db.

This is a great paper and I'm really looking forward to see how my profesor implemented the spectral subtraction on the code.

The Generalized Correlation Method for Estimation of Time Delay (C H Knapp, G C Carter): Another very fundamental paper from the 1970's. The basics of this paper is pretty simple. Before using the cross-correlation method on the two input signals, prefilters are used that aim to emphasize the signal at high SNR frequencies and suppress the signal at low SNR frequencies. The ways that are presented here are the Roth Processor, SCOT Processor, PHAT (which I think is widely used) and Eckart Filter. The mathematics behind it isn't that simple but there no use in explaining it right now.

That's basically it for this holiday week. Tomorrow I'm going back to providence, (I was in Boston since last saturday) and I'll look at the code and try to relate the suggested method in the draft paper to the code itself until monday. So my next post will probably be a week from now where I'll probably have some results in my hand.

0 Comments:

Post a Comment

<< Home