I m currently working on this project that implies some DSP skills. I must extract the audio from a movie and then, by analyzing it, I must determine when someone speaks or not, more like an voice activity detector.
我撰写了 Java的法典(即我知道它不是最佳选择),只使用图书馆从录像和JLayer中提取录音,以便我能够处理MP3。
在我的两个频道上连续获得样本:LEFT0、RLI0、LEFT1、RLI1、LEFT2、RLI2等。
因此,这是我迄今为止所做的:
- I put the samples for each channel in an array.
- I apply a Hamming window [N = 8192]:
double w = 0.54 - 0.46 * (Math.cos(2*Math.PI*buffer[i]/buffer.length-1));
fftBuffer[i] = new Complex(w, 0);
- I then perform a simple FFT on each channel and then compute the magnitude
mag = re^2 + im^2;
after that, I do a log scale (dB):mag_dB = 10 * log10(abs(mag));
由于我在这里“寻求发言权”,我需要80到1000之间间隔时间(甚至要在80赫兹和255赫兹之间调和声音)。 因此,从“FFT I”中,我只需要第一个N/2。
The frequency per bin (slot in the array resulted from the FFT) would be the sample rate (48.000 kHz) / N; that would be 48000 / 8192 = 5 Hz per bin. So I only look in the array at the values from FFT_Result[15] to FFT_Result[199] (16 * 5Hz = 80 Hz; 200 * 5 = 1000 Hz) right?!
我审视了 Cool埃迪特普罗的频率分析器,所有振荡都是负面的。 就我而言,第一个声音(声音是背景和直面的)是负面的,此后,它们都是积极的。 他们本应是消极的吗? 我在这里找不到什么东西?
迄今为止,根据我通过审视 Cool埃迪特普罗的频率分析器和阶段分析器所讲的话,我需要对这一频率范围设定一个门槛,即某种算法,如果该数值在频率范围内保持不变,并确定该音响是否集中。 最后一点必须做到(我认为),在有人发言时,对阶段的分析总是以声音为中心。
我不敢找到这样做的办法,而我完全混淆了我迄今所做的事情,因为我不知道我迄今所做的工作是否正确。
So, if you read all this, thank you for your patience and my questions are:
- have I done right what I ve done so far?
- does the amplitude has to be negative?
- does anyone know how I can compute the phase for a number of samples?