<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-19588363</id><updated>2011-06-24T05:15:49.195-05:00</updated><title type='text'>Research at Lems</title><subtitle type='html'>an informative journal to keep track of what I've been doing.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://lemspeech.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19588363/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://lemspeech.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Avram</name><uri>http://www.blogger.com/profile/08521126594670714516</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>9</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-19588363.post-113891471836346740</id><published>2006-02-02T14:20:00.000-05:00</published><updated>2006-02-02T16:11:58.430-05:00</updated><title type='text'>late post</title><content type='html'>This post should have been up here last sunday the latest.&lt;br /&gt;&lt;br /&gt;Last time I wrote that I was going to test to see if our method was working for simulator created data that doesn't include any noise or reverberations. I added a small part in the energy method that calculates the ratio of energy above the treshold value to all energy in each frame. If the ratio is less than 20db, I say that the frame is useless and do not use it to find the orientation. All said and done, I got near perfect results. I'm saying near perfect because I found out 2 "minor" problems.&lt;br /&gt;&lt;br /&gt;1) When we are close to a set of microphones and we are looking right at the opposite direction than the mics we are close to (that is the aiming microphones are considerably distant) I got a bias of almost 10 degrees.&lt;br /&gt;&lt;br /&gt;2) In files that worked really really well, there was usually one frame that would screw up, giving a difference of 10-20 degrees, but only in one frame. This was usually the same frame in each simulated data (remember i use the same clean speech file) so I think that is because of some special speech case.&lt;br /&gt;&lt;br /&gt;So after these results, I started thinking about what to do next and came up with the following ideas:&lt;br /&gt;&lt;br /&gt;1) check to see what is causing the bias I described.&lt;br /&gt;2) check to see what is causing the problem described in 2.&lt;br /&gt;3) trying to set up the discriminator without the clean speech&lt;br /&gt;4) try to come up with ways to deal with the reverberant energies&lt;br /&gt;5) try to come up with ways to use the energy method without position data.&lt;br /&gt;&lt;br /&gt;After discussing these with prof silverman, i decided to attack the first problem. My first suggestion was that it was caused by the simulator. The orientation data that the simulator uses suggest that the very high frequencies aren't right at the front but slightly to the left or to the right, so I thought in longer distances this had caused the bias. I was dramatically wrong.&lt;br /&gt;&lt;br /&gt;In the original energy method after interpolating the energies of the microphones to angles, a front to back ratio is taken. I found out that when we are close to the microphone we are aiming we screw up when we are not taking this ratio, however when we are far away from the aiming microphone then we screw up when we take this ratio. The reason to that problem is to do with the heights of the microphones. I have set up the problem and I'll explain it in my next post, it's complicated and it's not anything to do on the computer but it has to be solved or thought about using pen and paper.&lt;br /&gt;&lt;br /&gt;coming soon...!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19588363-113891471836346740?l=lemspeech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lemspeech.blogspot.com/feeds/113891471836346740/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19588363&amp;postID=113891471836346740' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19588363/posts/default/113891471836346740'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19588363/posts/default/113891471836346740'/><link rel='alternate' type='text/html' href='http://lemspeech.blogspot.com/2006/02/late-post.html' title='late post'/><author><name>Avram</name><uri>http://www.blogger.com/profile/08521126594670714516</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19588363.post-113799679963860376</id><published>2006-01-23T00:12:00.000-05:00</published><updated>2006-01-23T01:13:19.683-05:00</updated><title type='text'>Real problems - reverberant energies and a different corner effect.</title><content type='html'>The previous post is basically useless - in terms of direct contribution to my research. However, the things I've been going through since the last post has really opened my mind and gave me some invaluable experience in terms of how to react to unexpected results. Basically, I always have to investigate the reasoning behind the results and when making observations or coming to conclusions, I always have to support my points with clear quantitative data.&lt;br /&gt;&lt;br /&gt;The energy method, as my professor has proposed, is not working because of one major and one minor problem. Major problem is - when the talker is close to a reverberant wall, the magnitude of the reverberant signals get comparable to the original signals at distant microphones and the squared compensation of the signal received messes up the energy values. We get peaks at different locations.&lt;br /&gt;&lt;br /&gt;The minor problem is when looking directly at the corners (especially at distant locations to the corners) the microphones that are slightly off from the corners get pretty much the same frequencies and since they are closer to the talker the magnitudes are slightly larger. The squared compensation doesnt help making the energys at the corners make bigger so we basically get no peaks with in 5 degree error but get almost all the peaks with in 20 degree error.&lt;br /&gt;&lt;br /&gt;I'll be attacking the major problem first of all. To do this last week I have used some parts of the code of the simulator my professor has developed to get simulated HMA files seperately for direct and the reverberant speech given a cleen speech file. This week I'll start by&lt;br /&gt;&lt;br /&gt;1) testing using the direct speech HMA files if the energy method works without any reverberations for different types of speech (the different types of speech is a completely different area that i'll hope to talk about sometime later.)&lt;br /&gt;&lt;br /&gt;2) using the reverberant speech HMA files to observe the reverberant energies in different position and orientations.&lt;br /&gt;&lt;br /&gt;That's it for now.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19588363-113799679963860376?l=lemspeech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lemspeech.blogspot.com/feeds/113799679963860376/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19588363&amp;postID=113799679963860376' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19588363/posts/default/113799679963860376'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19588363/posts/default/113799679963860376'/><link rel='alternate' type='text/html' href='http://lemspeech.blogspot.com/2006/01/real-problems-reverberant-energies-and.html' title='Real problems - reverberant energies and a different corner effect.'/><author><name>Avram</name><uri>http://www.blogger.com/profile/08521126594670714516</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19588363.post-113708400973851981</id><published>2006-01-12T11:31:00.000-05:00</published><updated>2006-01-12T11:40:09.753-05:00</updated><title type='text'>First Results and First Problems</title><content type='html'>Well, I have worked on the data I was telling you about, there was some very weird results, anyway I then took some other data compared them, did some analysis and found this weird thing called the "corner effect" due to reverberation energy. Here is a copy of the document which I originally wrote for Prof. Silverman, but then didn't even give it to him since it's no big deal, the problem is very visible. However for me to keep track of what I'm doing even in the future I will post it here:&lt;br /&gt;--------------------------------------------0---------------------------------------&lt;br /&gt;&lt;br /&gt;I have used 2 sets of 5 recordings and processed them using the energy method code using different input parameters. Before explaining the procedure, the results and ideas about them, I think I have to explain the structure of the recordings.&lt;br /&gt;&lt;br /&gt;Below is an approximate (lousy/definitely not accurate) aerial layout of the room.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://photos1.blogger.com/blogger/5015/1941/1600/room.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://photos1.blogger.com/blogger/5015/1941/320/room.jpg" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;In the method the height hasn’t been considered as a factor, so I’ll not take that into account but it should be noted that not all sets of microphones are at the same height.&lt;br /&gt;&lt;br /&gt;The first set of data (from now on will be referred to as old files or old data) is HFS standing at the 5 locations that are marked with red and saying nearly the same sentence looking at pt 1. The second set of data (from now on will be referred to as new files or new data) is AL standing at the same 5 locations but looking at pt 2. So totally we have 10 HMA files 2 in each location, with exactly the opposite orientation. For each HMA file, I ran the energy method for 8 different cutoff frequencies, starting from 0 Hz, incrementing 500 Hz each time up to 3500 Hz both with and without spectral subtraction. For each run I get 6 percentage values: 5, 10, 20 degrees tolerance percentages from the processing of raw and smoothed data. The correct angle for the new data is 45 and 225 for the old data and I’m looking at the polar plots with cutoff frequencies 2000 Hz.&lt;br /&gt;&lt;br /&gt;My original aim was to observe the strength of the energy method as functions of different cutoff frequencies, spectral subtraction (with and without) and the position of the talker. However I’ve encountered a phenomenon which I’ll refer as the corner effect. Corner effect is the result of reverberant energies due to short reflections at the corners. When I try to analyze the data for 3 different points, I’m hoping that it will be clear.&lt;br /&gt;&lt;br /&gt;By the way, to pursue my original aim, I took new data today that are facing the midpoint of the horizontal wall and the empty space across it on a line that divides the room into half vertically.&lt;br /&gt;&lt;br /&gt;Closest to Orientation Point:&lt;br /&gt;&lt;br /&gt;Right now, I am looking at the old file from location 1 looking at pt1 and the new file from location 5 looking at pt2. The old data gives terrible results, going up to 20% with 20 degrees tolerance and less then 5% with 5 and 10 degrees of tolerance in raw energies that were spectrally subtracted. All other percentages are practically zero. The new data is slightly better, going up to 60% without and up to 50% with spectral subtraction when energies aren’t smoothed. When they are smoothed, we get 0’s everywhere. The only comparison we can make to get to an understanding of what’s going on is comparing the percentages found by the maximizing the raw energy’s in spectrally subtracted data (since the other stuff are all zeros in the old data). There are 2 important points to notice, 1) in the old data the 20degree tolerance percentage is much less than the new data and 2) in the old data the percentages vary exponential-like with the tolerance while in the new data they almost do not vary. When I looked at the polar plot of the old data, I see that most of the maximum energy is around 170 degrees and only few go above 200. &lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://photos1.blogger.com/blogger/5015/1941/1600/roomclosest.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://photos1.blogger.com/blogger/5015/1941/320/roomclosest.jpg" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;Polar plots of the new data shows some interesting results. In the new data we have around 60 frames that we calculate. The raw energies peak very near the orientation point for the first 30 frames. However in those 30 frames the smoothed energies show a peak around 330 degrees which probably mean that a lot of reverberant energy reflected from the horizontal wall is accumulated. The second half is even more interesting. Raw energies start to peak around 290 degrees, around 170 degrees and finally they peak at 260 degrees while the smooth energy peaks around 250 degrees the entire second half. This is clearly the effect of compensating the energies by the distance squared which blows the reverberant energies (even from across the room) to become more than what they should be.&lt;br /&gt;&lt;br /&gt;Middle of the room (location 3)&lt;br /&gt;&lt;br /&gt;So this is a comparison of the old data and the new data recorded at location 3. The old data is very unsuccessful without spectral subtraction. In the with spectral subtraction case, both raw and smooth energies yield up to 40% in +/-20 degrees tolerance. The method is working only with cutoff frequencies above 1500 Hz and the percentages vary linearly with the tolerance values. The success of the new data varies in different cutoff frequencies with spectral subtraction (with or without). I’ll look at the results with the spectral subtraction with frequencies above 1500 Hz. Raw data in 20 degrees tolerance vary around 70% and smoothed data varies around 80%. The difference with the old data is the percentages vary like exponentially with the tolerance.&lt;br /&gt;&lt;br /&gt;In the old data the angle of the maximum raw energy start out pretty randomly for the first 10 frames, then settles around 200 for the next 20, then 15 of them around 170 with some random ones between 180 and 260, next 10 around 200, and finally the remaining ones are either around 180 or 225. The angle of the maximum smoothed energies start around 250, then next 20 is around 200, then 15 of them between 180 and 225, then 10 of them 200 and 5 of them around 225.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://photos1.blogger.com/blogger/5015/1941/1600/roommidpoint.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://photos1.blogger.com/blogger/5015/1941/320/roommidpoint.jpg" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;The new data gives clearer results, in smoothed energies, we get 55 peaks around 30 degrees and then 10 around 250 degrees. The raw data gives the same 55 peaks around 30 degrees but the next 10 are around 150 degrees. I think to get rid of the peaks in these different places we have to use as large cutoff frequencies as possible (because of the directionality) as a first step. We will still get the peaks in a wrong angle but at least it’ll be one wrong angle instead of many wrong angles. To see that it’s true, I have run the same file with 3500 Hz of cutoff frequency. It didn’t work as good as I thought but instead of getting peaks at 250 and 150 I got the 10 peaks around 310 degrees, which is still weird?&lt;br /&gt;&lt;br /&gt;Farthest&lt;br /&gt;&lt;br /&gt;For the farthest one I’m only going to take a look at the new file, the talker looking at pt2. Since from what I’ve written, I see that it is clearly enough to look at the new files as they are better data.&lt;br /&gt;&lt;br /&gt;Comparing no spectral subtraction to with spectral subtraction, I see that although 20% values are slightly better in no case, 5% and 10% values are zero in no case while we at least get something in the with case. So this is my second point: Spectral subtraction is probably good for getting around the corner effect.&lt;br /&gt;&lt;br /&gt;When I looked at the polar plot as usual in 2000 Hz cutoff frequency, almost all the raw and the smoothed max energies were around 30 degrees, only 1 above 35. To see that I am correct in my first point, I have run the same data at 3500 which improves the percentages in 5 and 10 degrees tolerance. This time more of them are closer to 45 degrees.&lt;br /&gt;-----------------------------------------0------------------------------------------&lt;br /&gt;&lt;br /&gt;That's it. I hope there will be more to come by this weekend.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19588363-113708400973851981?l=lemspeech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lemspeech.blogspot.com/feeds/113708400973851981/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19588363&amp;postID=113708400973851981' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19588363/posts/default/113708400973851981'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19588363/posts/default/113708400973851981'/><link rel='alternate' type='text/html' href='http://lemspeech.blogspot.com/2006/01/first-results-and-first-problems.html' title='First Results and First Problems'/><author><name>Avram</name><uri>http://www.blogger.com/profile/08521126594670714516</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19588363.post-113675242320029245</id><published>2006-01-08T14:56:00.000-05:00</published><updated>2006-01-08T15:39:27.353-05:00</updated><title type='text'>Finally, some action</title><content type='html'>After all the previous posts about endless papers, this is one with some action in it. Professor Silverman gave me the code he wrote for the energy method to determine the talker orientation from the microphone array data. The code consists of a loop that does spectral subtraction(according to SF BOLL's paper described in the previous post) and calculates the energy above a given cutoff freq for each microphone. The minimum energy among all the microphones for each frame is subtracted from all the other microphones in the same frame to correct for some reverberation. Then this data is interpolated to give an energy value for all angles with an increment of 0.1 degrees. The front to back ratio of these energies are taken. Finally the data is LP filtered to get it smoother. The angle having the greatest front to back ratio is determined to be the orientation angle.&lt;br /&gt;&lt;br /&gt;The last week, i first tried to understand the code. One problem was in the same .m file the method and the testing was implemented. I like to work the opposite way, so I first tried to get the spectral subtraction code out. While doing that I ended up writing my own code, the result was different than the professor's. After trying to figure out what's going on, I found a missing if statement in mine, I corrected it and it became unbelievably good. I also took out the method as a function with input and output parameters.&lt;br /&gt;&lt;br /&gt;Currently we have 5 recordings from the microphone array. In all of them the talker is facing the same corner, they are on the same line which is a diagonal from the corner the talker is facing to the opposite corner, with nearly uniform distance between them.&lt;br /&gt;&lt;br /&gt;Finally, yesterday I wrote a test code to evaluate the 5 data with/out spectral subtraction and for different cutoff frequencies. I came today, realized that I messed up the variable names in the test code, so I'll run the program right after posting this and get the baseline results.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19588363-113675242320029245?l=lemspeech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lemspeech.blogspot.com/feeds/113675242320029245/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19588363&amp;postID=113675242320029245' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19588363/posts/default/113675242320029245'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19588363/posts/default/113675242320029245'/><link rel='alternate' type='text/html' href='http://lemspeech.blogspot.com/2006/01/finally-some-action.html' title='Finally, some action'/><author><name>Avram</name><uri>http://www.blogger.com/profile/08521126594670714516</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19588363.post-113591750450923557</id><published>2005-12-29T23:04:00.000-05:00</published><updated>2005-12-29T23:38:24.536-05:00</updated><title type='text'>More Papers(2)</title><content type='html'>Here are the other 2 papers that I read over this week.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;Suppression of Acoustic Noise Signals Using Spectral Subtraction (S F Boll)&lt;/span&gt;: This is truly a great paper written in 1979. The method used in this paper is the method that my advisor does subtracting the noise in the signals. What this paper suggests in suppressing the signal is take a part of the non-speech signal and create a signal that has the same statistical charachteristics over the full length of the speech signal. Now of course there are issues with statistical assumptions. First one is that the noise is assumed to be locally stationary - meaning that it's statistical properties are the same during the speech and non-speech segments. If there is a change, then the noise spectrum has to be recalculated. Basically the procedure is take the filtered and digitized speech, window it as half overlapped data buffers (hanning window is used since it's half overlapped the signal can be perfectly reconstructed), magnitude spectra of the windowed data are calculated, spectral noise bias during non-speech activity is subtracted, resulting negative amplitudes are zeroed out, secondary noise suppression done(explained in the next paragraph) , time-waveform is formed and it's overlap added to previous data.&lt;br /&gt;&lt;br /&gt;The secondary noise suppression methods are used to suppress the error in noise predicted to the real noise. Spectral subtraction process is actually applying a filter which has a frequency responce of [1-(predicte_noise_freq_response / freq_response_of_speech)]. The methods are:&lt;br /&gt;1) time averaging the freq_response of the speech source over a period of time where the speech is assumed to be stationary.&lt;br /&gt;2) half-wave rectification which is adding the magnitude of the filter to the filter itself and dividing it by 2. What this does is for the frequencies where the noisy signal magnitude is less then the predicted noise magnitude, the noisy signal magnitude is changed to zero. The advantage of doing this is the noise floor is reduced by the freq response of the noise. The disadvantage is the cases where the noise and the speech magnitude is less than the predicted noise magnitude are basically lost.&lt;br /&gt;3) Residual noise reduction - after half wafe rectification for the frequencies where the estimated noise magnitude is less than the real noise magnitude at the non-speech activity, you look at adjacent frames and replace the value with the least one.&lt;br /&gt;4) Additional Signal Attenuation during Non-Speech Activity: If for a certain frame over all frequencies the ratio of the subtracted signal over the estimated noise is less than -12db, that frame is considered as no-speech activity. In those frames what you do is just attenuate the signal 30db.&lt;br /&gt;&lt;br /&gt;This is a great paper and I'm really looking forward to see how my profesor implemented the spectral subtraction on the code.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;The Generalized Correlation Method for Estimation of Time Delay (C H Knapp, G C Carter)&lt;/span&gt;: Another very fundamental paper from the 1970's. The basics of this paper is pretty simple. Before using the cross-correlation method on the two input signals, prefilters are used that aim to emphasize the signal at high SNR frequencies and suppress the signal at low SNR frequencies. The ways that are presented here are the Roth Processor, SCOT Processor, PHAT (which I think is widely used) and Eckart Filter. The mathematics behind it isn't that simple but there no use in explaining it right now.&lt;br /&gt;&lt;br /&gt;That's basically it for this holiday week. Tomorrow I'm going back to providence, (I was in Boston since last saturday) and I'll look at the code and try to relate the suggested method in the draft paper to the code itself until monday. So my next post will probably be a week from now where I'll probably have some results in my hand.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19588363-113591750450923557?l=lemspeech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lemspeech.blogspot.com/feeds/113591750450923557/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19588363&amp;postID=113591750450923557' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19588363/posts/default/113591750450923557'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19588363/posts/default/113591750450923557'/><link rel='alternate' type='text/html' href='http://lemspeech.blogspot.com/2005/12/more-papers2.html' title='More Papers(2)'/><author><name>Avram</name><uri>http://www.blogger.com/profile/08521126594670714516</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19588363.post-113589037075067457</id><published>2005-12-29T15:16:00.000-05:00</published><updated>2005-12-29T18:00:31.413-05:00</updated><title type='text'>More Papers</title><content type='html'>In my last post, I wrote about a paper which is correcting the baseline algorithm for determining talker orientation. It's not a paper yet, it's a method that my advisor thinks will work, he already wrote a lot of matlab code to implement it and i'll be doing the tests. Here is a little about the method and the paper:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;An Improved Energy Method for Estimating Talker Orientation from Large Aperture Microphone Array Data (H F Silverman)&lt;/span&gt;: The differences between this method and the baseline method are&lt;br /&gt;1) this method removes background noise fully using spatial substraction (later in this post, i'll talk about a paper that does this)&lt;br /&gt;2) the energy measure for orientation is improved considering high freq and more directional data and by forming a ratio of front-to-back high freq energy.&lt;br /&gt;3) an effort is made to subtract the reverberant component of the energy ratio, prior to correcting for the distance. (the previous paper corrected the distance with the reverberation component not removed which caused a lot of problems if the speaker is near the corners.)&lt;br /&gt;&lt;br /&gt;The method, very briefly is like this. The signal recieved at the microphone is modeled as having a direct component, a reflective component and a noise term. Assuming that we are in the ideal condition that there is no noise due to background or reflections, we are left with the direct wave component which is the original source delayed by time T, convulved with the source impulse response which is also delayed by T, is dependent on the angle to the microphone and the whole thing is attenuated by the distance from the source to the mic. So once you compansate for the delay and attenuation we have a modified mic signal which is the original source signal  convulved with the source impulse response multiplied by a constant(which comes from the attenuation factor). So now for each of the modified microphone signals we take the energy for a frame of length K, draw the radiation pattern and find the orientation.&lt;br /&gt;&lt;br /&gt;The source impulse response has 3 complicating components.&lt;br /&gt;1) radiation energy is not uniform wrt orientation&lt;br /&gt;2) radiation pattern has different freq response magnitudes due to orientation&lt;br /&gt;3) there are small delays due to the occlusion of the head.&lt;br /&gt;&lt;br /&gt;the calculations in this method only makes use of the 1st one of the properties of the source impulse response however the simulator described in the earlier post shows us that the 2nd and 3rd parts effects last for a small time so the errors arent important for frames larger than 1ms.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;Joint Sound Localization and Orientation Estimation (B Mungamuru, P Aarabi)&lt;/span&gt;: I found this paper while googleing the words "speech orientation array". I like this paper because it sees the localization and orientation problem a whole - somehow instinctively I think this is the way to deal with these problems. I don't like this paper because it makes the problem look very simple. Overall it was well worth reading because it broadend my perspective.&lt;br /&gt;&lt;br /&gt;The method is as follows: There is an attenuation function depending on the orientation of the source. There is an attenuation function that is dependent on the distance between the source and the microphone. Finally, there is an attenuation function depending on the orientation of the microphone. Now this hasn't been thought of in the LEMS papers. All of them think of the microphones as being omnidirectional, however when a directional meaning is added to the mic, we get another degree of freedom which let us operate on the two problems at once which might be a good idea.&lt;br /&gt;&lt;br /&gt;However first of all the source and the microphone attenuation functions are not only functions basically, they are impulse responses, second of all it is not that easy to model them as cos functions.&lt;br /&gt;&lt;br /&gt;Anyway after modelling the attenuation and adding a simple delay into the business the paper writes a big matrix eq. Assuming that the source is gaussian, it uses some statistical methods to find a position and orientation which maximizes the probability of the matrix to be that way. (statistics is rather a weaker side of me so I'll probably need more help to understand the math behind this paper) Anyway at the end it comes with a search eq.&lt;br /&gt;&lt;br /&gt;I have to note that while the paper assumes that there is noise inside the signal recieved at the mic., it doesn't take into account the reverberation factor which is also a negative point.&lt;br /&gt;&lt;br /&gt;Finally the paper says that the method stated is a generalization of the delay-sum beamforming method which again I think is not an accurate assumption. I might talk more about this paper if I can make my professor read it and discuss it with me.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;The Time-Delay and the Delayogram - New Visualizations for Time-Delay (H F Silverman, J M Sachar)&lt;/span&gt; I found this paper in our servers and I thought it might be of interest to me. It proposes a visualization method for the delay as a delay vs freq graph (the time-delay graph) then it uses this graph to make a delay vs time elapsed graph, more like an analog to spectogram - hence the name delayogram. I wont get into further detail, they might prove to be useful tools but the methods behind them aren't much important to me at this stage.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;Factors Affecting the Performance of Large-Aperture Microphone Arrays (H F Silverman, W R Patterson III, J Sachar)&lt;/span&gt; With all due respect, this has been the most boring paper I have read so far. I will not go deeply into it as well. It formulates the output of the array mathematically. From that, it formulates the beam pattern and hence, it shows the results graphically. They can get the noise down 40db and the reverberation 80 db by adding 1000 outputs with the exact same singal as the input and then averaging it - well, i think i have to read this paper over once again to really understand the results. I hope to come back with a better understanding.&lt;br /&gt;&lt;br /&gt;Well, I'm a little tired. There are two more papers to write about, both are from the  1970's and are really good, especially one of them. I'll write about them either later tonight or tomorrow.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19588363-113589037075067457?l=lemspeech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lemspeech.blogspot.com/feeds/113589037075067457/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19588363&amp;postID=113589037075067457' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19588363/posts/default/113589037075067457'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19588363/posts/default/113589037075067457'/><link rel='alternate' type='text/html' href='http://lemspeech.blogspot.com/2005/12/more-papers.html' title='More Papers'/><author><name>Avram</name><uri>http://www.blogger.com/profile/08521126594670714516</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19588363.post-113523395604866578</id><published>2005-12-22T00:16:00.000-05:00</published><updated>2005-12-22T01:45:56.066-05:00</updated><title type='text'>Talker Orientation</title><content type='html'>I was hoping to write a little earlier than this, actually on monday, right after the DSP final. Well, finals bring a lot of stress with them and even though I wasn't physically tired at the end I was toasted because of the stress. The DSP final was a long one and although the questions were doable if you had the time to think, doing them in 3 hours were pretty tough. Anyway I got a 71, which turned out to be the top grade in the class to my suprise. Although I think I could have done better than that (I lost a silly 5 points in a question that I actually solved while studying for the exam) I'm happy that I got the top grade. I did really well on the math final too, I got a 41/50 and got a solid A from that course. However, I really did screw up on the digital design final, I got a lousy 59 and got a B. I have no idea how the hell that happened, anyway as my advisor said today, 2A's and a B is the next best thing to 3A's. Comparing it to my undergrad degree, I think I have really improved in terms of course work and I'll make it even better next semester. This was my first ever semester that I decided to focus on my courses and I did succeed.&lt;br /&gt;&lt;br /&gt;Back to research. I read the three - actually two and a half papers. The half is the unfinished paper, which I actually will do some work on it and finish it (and hopefully get my name on it.)&lt;br /&gt;&lt;br /&gt;&lt;em&gt;A Baseline Algorithm for Estimating Talker Orientation Using Acoustical Data From a Large-Aperture Microphone Array (J M Sachar and H F Silverman):&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;This paper aims to lay the groundwork for a basic method to find the orientation of a talker that is surrounded by a microphone array. Apparently if you record a talker in an anechoic chamber the energy radiation pattern around the talker shows that there is a 6.6 dB front to back ratio. So the paper claims that if one can detect this difference, than the orientation can be found assuming the position is known a priori.&lt;br /&gt;&lt;br /&gt;The method goes like this. The the sound that the jth mic records is an impulse response that is dependent on the position of the talker and the mic convulved with the speech signal at its source plus some background noise. The impulse response is defined as the addition of the direct speech with a delay to get to the microphone attenuated by the inverse sq law + a reverberent component. To compansate for the time delay, the paper shifts the signal recieved at the mic backwards in time the amount of the defined delay(which is known since the position is known) and then multiplies it with the distance to compansate for the inv. square law.&lt;br /&gt;&lt;br /&gt;It is also known that the impulse response of the mouth has a dependence on frequency. Namely, the high frequency components are more directional, they vanish as you move away from the direction the talker is facing. However the low frequencies are much less directional. The background noise and the reverberation components - if their magnitudes are comparable to the speech itself - have low freq charachteristics (as this paper assumes) and may screw up the dependence of the speech signal to the azimuthal direction.  Therefore they apply a high pass filter and then take the energy in a frame of T seconds. Needless to say this is done for each of the microphones. The result is an equation in the form of E=S+R1+R2. The term that is of interest is S. For the method to work, R's should either be constant for different mics, be significantly small or vary the same way as S. The paper claims that R1 will vary about zero rapidly as a function of time so it can be erased using an LP filter and R2 which contains reverberation energy will be constant over the mics since it depends on a very large number of random reflections. So once you apply the LP filter on E, the mic that contains the max argument will be the direction the talker is facing.&lt;br /&gt;&lt;br /&gt;This paper proves to be very insufficient and only correct for certain very specific cases. I'll write about it when I write about the paper that is a kind of a sequel to this, which is half written and I'll do the tests and the experiments on them.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;A Speech-Source Simulator that Models the Source Transfer Function,  Room Acoustics and Background Noise (H F Silverman, Y Yu)&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;This is a simple paper. The reason that a simulator was needed is the more complex the experiments become, the more need of some controllable data. The signal received at a mic in a room is actually the speech convulved with the impulse responce of the talker, of the room and the mic itself. The paper ignores the impulse response of the mic for the cases of speech.&lt;br /&gt;&lt;br /&gt;The room is modeled as a rectangular room with the same reflection coefficient everywhere and it is assumed that all the walls are smooth. To get the room impulse response the paper uses the image method which creates imaginary rooms in every direction same as the original room but symmetric to all the other imaginary rooms around it. Then from the mic in each of the imaginary rooms there is a straight line to the talker in the real room. As the line passes through a border it is multiplied by the reflection coefficient. Using this,  a room impulse response is created.&lt;br /&gt;&lt;br /&gt;The talker impulse response has an attenuation pattern as a function of direction from the normal to a source point and it's frequency dependent to these attenuation patterns. There are already published data on these. There are also some time-delay effects due to the head of the speaker. These effects are modeled on the paper and a talker impulse response is created. There are results and sample impulse responses in the paper itself.&lt;br /&gt;&lt;br /&gt;I'll write about the third paper later on as I do some work on it. I hope it'll turn out to be pretty neat work and I'll get my name on it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19588363-113523395604866578?l=lemspeech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lemspeech.blogspot.com/feeds/113523395604866578/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19588363&amp;postID=113523395604866578' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19588363/posts/default/113523395604866578'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19588363/posts/default/113523395604866578'/><link rel='alternate' type='text/html' href='http://lemspeech.blogspot.com/2005/12/talker-orientation.html' title='Talker Orientation'/><author><name>Avram</name><uri>http://www.blogger.com/profile/08521126594670714516</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19588363.post-113459691682531884</id><published>2005-12-14T16:04:00.000-05:00</published><updated>2005-12-14T17:06:03.396-05:00</updated><title type='text'>First Meeting</title><content type='html'>I had two final exams on monday and tuesday. Both of them were better than what I had expected and I hope I did well on them. I really need some A's at this point.&lt;br /&gt;&lt;br /&gt;Afterwards the final on tuesday, I went back to my office and found out some really good food and a typical graduate school lab party enviroment. Apparently, it's the 24th year of our lab, it's christmas and it's Arpie's 71st birthday. Arpie is the technician of our lab, he has been there since the beginning, he's the only guy that everyone trusts when it comes to constructing boards. Anyway during the party I let my professor know that my next (and last) exam is on monday, so we decided to have a research meeting.&lt;br /&gt;&lt;br /&gt;Two other people started with me this year. First one is a Vietnamese guy called Hoang Do. He is from the City College of New York in Staten Island. He is a nice quiet guy and he is already funded by the professor. The other guy is Matt Gillette (I hope the spelling is correct), he is from the Trinity College in Hartford, Connecticut. He is a masters student like me. He is also a very quiet guy and he is a lot more into hardware than me or Hoang.&lt;br /&gt;&lt;br /&gt;So the Professor, Hoang, Matt and I held our first research meeting this morning. Prof. told us about the path to the phd, the exams and everything first, the stuff that we already kind of knew. Then he told us what he did over the years, so I understand that he had a lot of stuff done in Computer Architecture in the 70's and 80's, the he got into speech recognition and finally in mid 90's he built the largest microphone array and he's now developing algorithms on it. He explained his main goal as "acquiring speech remotely". I hope that at some point I will know a lot about what has been done in reaching that goal and what are the problems that are lying ahead. However that's too early for now.&lt;br /&gt;&lt;br /&gt;Anyway now I have a small idea on what I'll be researching on. The last guy that graduated as a Ph.d did some work on finding the orientation(where he is looking) of a speaker using a large aperture microphone array (large aperture means the array is a 360 array, so there are michrophones all around the source.) Apperantly his thesis and the paper is only true for some very spesific conditions. So after he left Prof Silverman wanted to create a better method. However to test his method he wanted to be able to control the data he will process so he wrote a specific simulator to generate the sound of a speaker convulved with the rooms impulse response, the speakers mouths impulse response plus some background noise.&lt;br /&gt;&lt;br /&gt;Prof gave me two and a half papers to read. The first one is the former phd students work, second one is how he wrote the simulator and one halfth one is the introduction of his new method. I think I'll start by doing some of the testing for that method. So my aim right now is trying to read and understand all of the papers by next monday so that after the exam I will get right into work.&lt;br /&gt;&lt;br /&gt;My next post will probably be about the three papers and what I think about them.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19588363-113459691682531884?l=lemspeech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lemspeech.blogspot.com/feeds/113459691682531884/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19588363&amp;postID=113459691682531884' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19588363/posts/default/113459691682531884'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19588363/posts/default/113459691682531884'/><link rel='alternate' type='text/html' href='http://lemspeech.blogspot.com/2005/12/first-meeting.html' title='First Meeting'/><author><name>Avram</name><uri>http://www.blogger.com/profile/08521126594670714516</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19588363.post-113377220539538506</id><published>2005-12-05T03:24:00.000-05:00</published><updated>2005-12-05T03:43:25.403-05:00</updated><title type='text'>Introduction</title><content type='html'>&lt;span style="font-size:85%;"&gt;Hello,&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;This is my second attempt on blogging, the first one being a total disaster.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;This blog will mainly be about my research at the laboratory for engineering man machined systems (&lt;a href="http://www.lems.brown.edu"&gt;LEMS&lt;/a&gt;), part of the engineering department at Brown University in Providence, RI. I am aiming to pursue a Ph.D. in the electrical sciences and computer engineering program. I came here to work with Prof. &lt;a href="http://www.engin.brown.edu/faculty/Silverman/"&gt;Harvey Silverman&lt;/a&gt; in August 2005. I was expecting to start doing some research in my first semester, however Profesor was wise enough to let me concentrate on my courses, especially the DSP course which he is lecturing at.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;The research is mostly about michrophone arrays. A huge michrophone array -which I have no idea on what it is except the fact that it is an array of michrophones- has been installed in the last couple of years and there has been a lot of work going on with it. I assume I will be using it a lot to do a lot of research on speech.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;Initially, I am intending to keep this blog as a tool to track my progress, however I think it may turn out to be an exciting source for people who are interested in the research subject, who are interested in doing graduate school in the States or at Brown University.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;I have two important finals on the 12th and the 13th. I am expecting to have a meeting with the Prof. and start some preliminary work before Christmas time arrives. That means I will have a lot more of an idea of what's going on, hence my posts will hopefully get a lot more informative.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;Till next time,&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19588363-113377220539538506?l=lemspeech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lemspeech.blogspot.com/feeds/113377220539538506/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19588363&amp;postID=113377220539538506' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19588363/posts/default/113377220539538506'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19588363/posts/default/113377220539538506'/><link rel='alternate' type='text/html' href='http://lemspeech.blogspot.com/2005/12/introduction.html' title='Introduction'/><author><name>Avram</name><uri>http://www.blogger.com/profile/08521126594670714516</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
