Question

I m currently working on this project that implies some DSP skills. I must extract the audio from a movie and then, by analyzing it, I must determine when someone speaks or not, more like an voice activity detector.

我撰写了 Java的法典(即我知道它不是最佳选择),只使用图书馆从录像和JLayer中提取录音,以便我能够处理MP3。

在我的两个频道上连续获得样本:LEFT0、RLI0、LEFT1、RLI1、LEFT2、RLI2等。

因此,这是我迄今为止所做的:

I put the samples for each channel in an array.

I apply a Hamming window [N = 8192]:

double w = 0.54 - 0.46 * (Math.cos(2*Math.PI*buffer[i]/buffer.length-1)); fftBuffer[i] = new Complex(w, 0);

I then perform a simple FFT on each channel and then compute the magnitude mag = re^2 + im^2; after that, I do a log scale (dB): mag_dB = 10 * log10(abs(mag));

由于我在这里“寻求发言权”,我需要80到1000之间间隔时间(甚至要在80赫兹和255赫兹之间调和声音)。因此,从“FFT I”中,我只需要第一个N/2。

The frequency per bin (slot in the array resulted from the FFT) would be the sample rate (48.000 kHz) / N; that would be 48000 / 8192 = 5 Hz per bin. So I only look in the array at the values from FFT_Result[15] to FFT_Result[199] (16 * 5Hz = 80 Hz; 200 * 5 = 1000 Hz) right?!

我审视了 Cool埃迪特普罗的频率分析器,所有振荡都是负面的。就我而言,第一个声音(声音是背景和直面的)是负面的,此后,它们都是积极的。他们本应是消极的吗? 我在这里找不到什么东西?

迄今为止,根据我通过审视 Cool埃迪特普罗的频率分析器和阶段分析器所讲的话,我需要对这一频率范围设定一个门槛,即某种算法,如果该数值在频率范围内保持不变,并确定该音响是否集中。最后一点必须做到(我认为),在有人发言时,对阶段的分析总是以声音为中心。

我不敢找到这样做的办法,而我完全混淆了我迄今所做的事情,因为我不知道我迄今所做的工作是否正确。

So, if you read all this, thank you for your patience and my questions are:
- have I done right what I ve done so far?
- does the amplitude has to be negative?
- does anyone know how I can compute the phase for a number of samples?

Answer 1

In dB, the amplitude can be negative or positive, it doesn t matter. What matters, is the value relative to some threshold. I would base the threshold on surrounding samples. Because the energy in spoken words goes up and down as syllables are spoken, a simple average (multiplied by some arbitrary factor you ll have to play with to find what works well) would work fine as a threshold.

就时段而言,您首先可以进行Hilbert的转变,然后在每张样本的实际和想象部分使用an2,估算瞬时阶段。

Answer 2

不要看个别渠道的阶段,你可以检查这两个渠道之间的拖延。假设向这两个渠道发出同样的信号,从这一跨渠道的拖延中可以找到合理来源的方向。假定约20厘米的耳到耳相距,这种延迟最多为340=.58ms或约30个样本@ 48k Hz. 如果你计算这一范围(30个样本)的相互关系,你就应当找到一个标明来源方向的高峰。

为了找到类似声音的信号,你可以按照某种合理价值计算80-1000赫兹带的总能量。无论在频率领域,你都可以这样做,把双筒的大小从80到1 000赫兹,或者用带过滤器和区域监测系统的数值计算,在时间顶点。

Answer 3

You have a double sided transform. The midpoint is the DC component. A negative frequency is really a positive frequency that is 180 degrees out of phase! So, if you use the first half of the FFT values w/negative freqs you need to change the phase by pi to have an accurate picture of what is happening.

另一种做法是,如果四舍五入到正数,分期正确,则使用金融市场的后半值。

I took a look on the frequency analyzer in Cool Edit Pro and all the amplitudes are negative. In my case, the first ones (the sound is in the background and isn t loud) are negative, and after that, they are all positive. Aren t they supposed to be negative? Am I missing out something over here?

友情链接