More Papers
In my last post, I wrote about a paper which is correcting the baseline algorithm for determining talker orientation. It's not a paper yet, it's a method that my advisor thinks will work, he already wrote a lot of matlab code to implement it and i'll be doing the tests. Here is a little about the method and the paper:
An Improved Energy Method for Estimating Talker Orientation from Large Aperture Microphone Array Data (H F Silverman): The differences between this method and the baseline method are
1) this method removes background noise fully using spatial substraction (later in this post, i'll talk about a paper that does this)
2) the energy measure for orientation is improved considering high freq and more directional data and by forming a ratio of front-to-back high freq energy.
3) an effort is made to subtract the reverberant component of the energy ratio, prior to correcting for the distance. (the previous paper corrected the distance with the reverberation component not removed which caused a lot of problems if the speaker is near the corners.)
The method, very briefly is like this. The signal recieved at the microphone is modeled as having a direct component, a reflective component and a noise term. Assuming that we are in the ideal condition that there is no noise due to background or reflections, we are left with the direct wave component which is the original source delayed by time T, convulved with the source impulse response which is also delayed by T, is dependent on the angle to the microphone and the whole thing is attenuated by the distance from the source to the mic. So once you compansate for the delay and attenuation we have a modified mic signal which is the original source signal convulved with the source impulse response multiplied by a constant(which comes from the attenuation factor). So now for each of the modified microphone signals we take the energy for a frame of length K, draw the radiation pattern and find the orientation.
The source impulse response has 3 complicating components.
1) radiation energy is not uniform wrt orientation
2) radiation pattern has different freq response magnitudes due to orientation
3) there are small delays due to the occlusion of the head.
the calculations in this method only makes use of the 1st one of the properties of the source impulse response however the simulator described in the earlier post shows us that the 2nd and 3rd parts effects last for a small time so the errors arent important for frames larger than 1ms.
Joint Sound Localization and Orientation Estimation (B Mungamuru, P Aarabi): I found this paper while googleing the words "speech orientation array". I like this paper because it sees the localization and orientation problem a whole - somehow instinctively I think this is the way to deal with these problems. I don't like this paper because it makes the problem look very simple. Overall it was well worth reading because it broadend my perspective.
The method is as follows: There is an attenuation function depending on the orientation of the source. There is an attenuation function that is dependent on the distance between the source and the microphone. Finally, there is an attenuation function depending on the orientation of the microphone. Now this hasn't been thought of in the LEMS papers. All of them think of the microphones as being omnidirectional, however when a directional meaning is added to the mic, we get another degree of freedom which let us operate on the two problems at once which might be a good idea.
However first of all the source and the microphone attenuation functions are not only functions basically, they are impulse responses, second of all it is not that easy to model them as cos functions.
Anyway after modelling the attenuation and adding a simple delay into the business the paper writes a big matrix eq. Assuming that the source is gaussian, it uses some statistical methods to find a position and orientation which maximizes the probability of the matrix to be that way. (statistics is rather a weaker side of me so I'll probably need more help to understand the math behind this paper) Anyway at the end it comes with a search eq.
I have to note that while the paper assumes that there is noise inside the signal recieved at the mic., it doesn't take into account the reverberation factor which is also a negative point.
Finally the paper says that the method stated is a generalization of the delay-sum beamforming method which again I think is not an accurate assumption. I might talk more about this paper if I can make my professor read it and discuss it with me.
The Time-Delay and the Delayogram - New Visualizations for Time-Delay (H F Silverman, J M Sachar) I found this paper in our servers and I thought it might be of interest to me. It proposes a visualization method for the delay as a delay vs freq graph (the time-delay graph) then it uses this graph to make a delay vs time elapsed graph, more like an analog to spectogram - hence the name delayogram. I wont get into further detail, they might prove to be useful tools but the methods behind them aren't much important to me at this stage.
Factors Affecting the Performance of Large-Aperture Microphone Arrays (H F Silverman, W R Patterson III, J Sachar) With all due respect, this has been the most boring paper I have read so far. I will not go deeply into it as well. It formulates the output of the array mathematically. From that, it formulates the beam pattern and hence, it shows the results graphically. They can get the noise down 40db and the reverberation 80 db by adding 1000 outputs with the exact same singal as the input and then averaging it - well, i think i have to read this paper over once again to really understand the results. I hope to come back with a better understanding.
Well, I'm a little tired. There are two more papers to write about, both are from the 1970's and are really good, especially one of them. I'll write about them either later tonight or tomorrow.
An Improved Energy Method for Estimating Talker Orientation from Large Aperture Microphone Array Data (H F Silverman): The differences between this method and the baseline method are
1) this method removes background noise fully using spatial substraction (later in this post, i'll talk about a paper that does this)
2) the energy measure for orientation is improved considering high freq and more directional data and by forming a ratio of front-to-back high freq energy.
3) an effort is made to subtract the reverberant component of the energy ratio, prior to correcting for the distance. (the previous paper corrected the distance with the reverberation component not removed which caused a lot of problems if the speaker is near the corners.)
The method, very briefly is like this. The signal recieved at the microphone is modeled as having a direct component, a reflective component and a noise term. Assuming that we are in the ideal condition that there is no noise due to background or reflections, we are left with the direct wave component which is the original source delayed by time T, convulved with the source impulse response which is also delayed by T, is dependent on the angle to the microphone and the whole thing is attenuated by the distance from the source to the mic. So once you compansate for the delay and attenuation we have a modified mic signal which is the original source signal convulved with the source impulse response multiplied by a constant(which comes from the attenuation factor). So now for each of the modified microphone signals we take the energy for a frame of length K, draw the radiation pattern and find the orientation.
The source impulse response has 3 complicating components.
1) radiation energy is not uniform wrt orientation
2) radiation pattern has different freq response magnitudes due to orientation
3) there are small delays due to the occlusion of the head.
the calculations in this method only makes use of the 1st one of the properties of the source impulse response however the simulator described in the earlier post shows us that the 2nd and 3rd parts effects last for a small time so the errors arent important for frames larger than 1ms.
Joint Sound Localization and Orientation Estimation (B Mungamuru, P Aarabi): I found this paper while googleing the words "speech orientation array". I like this paper because it sees the localization and orientation problem a whole - somehow instinctively I think this is the way to deal with these problems. I don't like this paper because it makes the problem look very simple. Overall it was well worth reading because it broadend my perspective.
The method is as follows: There is an attenuation function depending on the orientation of the source. There is an attenuation function that is dependent on the distance between the source and the microphone. Finally, there is an attenuation function depending on the orientation of the microphone. Now this hasn't been thought of in the LEMS papers. All of them think of the microphones as being omnidirectional, however when a directional meaning is added to the mic, we get another degree of freedom which let us operate on the two problems at once which might be a good idea.
However first of all the source and the microphone attenuation functions are not only functions basically, they are impulse responses, second of all it is not that easy to model them as cos functions.
Anyway after modelling the attenuation and adding a simple delay into the business the paper writes a big matrix eq. Assuming that the source is gaussian, it uses some statistical methods to find a position and orientation which maximizes the probability of the matrix to be that way. (statistics is rather a weaker side of me so I'll probably need more help to understand the math behind this paper) Anyway at the end it comes with a search eq.
I have to note that while the paper assumes that there is noise inside the signal recieved at the mic., it doesn't take into account the reverberation factor which is also a negative point.
Finally the paper says that the method stated is a generalization of the delay-sum beamforming method which again I think is not an accurate assumption. I might talk more about this paper if I can make my professor read it and discuss it with me.
The Time-Delay and the Delayogram - New Visualizations for Time-Delay (H F Silverman, J M Sachar) I found this paper in our servers and I thought it might be of interest to me. It proposes a visualization method for the delay as a delay vs freq graph (the time-delay graph) then it uses this graph to make a delay vs time elapsed graph, more like an analog to spectogram - hence the name delayogram. I wont get into further detail, they might prove to be useful tools but the methods behind them aren't much important to me at this stage.
Factors Affecting the Performance of Large-Aperture Microphone Arrays (H F Silverman, W R Patterson III, J Sachar) With all due respect, this has been the most boring paper I have read so far. I will not go deeply into it as well. It formulates the output of the array mathematically. From that, it formulates the beam pattern and hence, it shows the results graphically. They can get the noise down 40db and the reverberation 80 db by adding 1000 outputs with the exact same singal as the input and then averaging it - well, i think i have to read this paper over once again to really understand the results. I hope to come back with a better understanding.
Well, I'm a little tired. There are two more papers to write about, both are from the 1970's and are really good, especially one of them. I'll write about them either later tonight or tomorrow.

0 Comments:
Post a Comment
<< Home