English 中文(简体)
如何从音像样本中确定频率和阶段的视角?
原标题:How to determine the magnitude of a frequency and the phase angle from an audio sample?

I m currently working on this project that implies some DSP skills. I must extract the audio from a movie and then, by analyzing it, I must determine when someone speaks or not, more like an voice activity detector.

我撰写了 Java的法典(即我知道它不是最佳选择),只使用图书馆从录像和JLayer中提取录音,以便我能够处理MP3。

在我的两个频道上连续获得样本:LEFT0、RLI0、LEFT1、RLI1、LEFT2、RLI2等。

因此,这是我迄今为止所做的:

由于我在这里“寻求发言权”,我需要80到1000之间间隔时间(甚至要在80赫兹和255赫兹之间调和声音)。 因此,从“FFT I”中,我只需要第一个N/2。

The frequency per bin (slot in the array resulted from the FFT) would be the sample rate (48.000 kHz) / N; that would be 48000 / 8192 = 5 Hz per bin. So I only look in the array at the values from FFT_Result[15] to FFT_Result[199] (16 * 5Hz = 80 Hz; 200 * 5 = 1000 Hz) right?!

我审视了 Cool埃迪特普罗的频率分析器,所有振荡都是负面的。 就我而言,第一个声音(声音是背景和直面的)是负面的,此后,它们都是积极的。 他们本应是消极的吗? 我在这里找不到什么东西?

迄今为止,根据我通过审视 Cool埃迪特普罗的频率分析器和阶段分析器所讲的话,我需要对这一频率范围设定一个门槛,即某种算法,如果该数值在频率范围内保持不变,并确定该音响是否集中。 最后一点必须做到(我认为),在有人发言时,对阶段的分析总是以声音为中心。

我不敢找到这样做的办法,而我完全混淆了我迄今所做的事情,因为我不知道我迄今所做的工作是否正确。

So, if you read all this, thank you for your patience and my questions are:
- have I done right what I ve done so far?
- does the amplitude has to be negative?
- does anyone know how I can compute the phase for a number of samples?

最佳回答

In dB, the amplitude can be negative or positive, it doesn t matter. What matters, is the value relative to some threshold. I would base the threshold on surrounding samples. Because the energy in spoken words goes up and down as syllables are spoken, a simple average (multiplied by some arbitrary factor you ll have to play with to find what works well) would work fine as a threshold.

就时段而言,您首先可以进行Hilbert的转变,然后在每张样本的实际和想象部分使用an2,估算瞬时阶段。

问题回答

不要看个别渠道的阶段,你可以检查这两个渠道之间的拖延。 假设向这两个渠道发出同样的信号,从这一跨渠道的拖延中可以找到合理来源的方向。 假定约20厘米的耳到耳相距,这种延迟最多为340=.58ms或约30个样本@ 48k Hz. 如果你计算这一范围(30个样本)的相互关系,你就应当找到一个标明来源方向的高峰。

为了找到类似声音的信号,你可以按照某种合理价值计算80-1000赫兹带的总能量。 无论在频率领域,你都可以这样做,把双筒的大小从80到1 000赫兹,或者用带过滤器和区域监测系统的数值计算,在时间顶点。

You have a double sided transform. The midpoint is the DC component. A negative frequency is really a positive frequency that is 180 degrees out of phase! So, if you use the first half of the FFT values w/negative freqs you need to change the phase by pi to have an accurate picture of what is happening.

另一种做法是,如果四舍五入到正数,分期正确,则使用金融市场的后半值。

I took a look on the frequency analyzer in Cool Edit Pro and all the amplitudes are negative. In my case, the first ones (the sound is in the background and isn t loud) are negative, and after that, they are all positive. Aren t they supposed to be negative? Am I missing out something over here?





相关问题
Spring Properties File

Hi have this j2ee web application developed using spring framework. I have a problem with rendering mnessages in nihongo characters from the properties file. I tried converting the file to ascii using ...

Logging a global ID in multiple components

I have a system which contains multiple applications connected together using JMS and Spring Integration. Messages get sent along a chain of applications. [App A] -> [App B] -> [App C] We set a ...

Java Library Size

If I m given two Java Libraries in Jar format, 1 having no bells and whistles, and the other having lots of them that will mostly go unused.... my question is: How will the larger, mostly unused ...

How to get the Array Class for a given Class in Java?

I have a Class variable that holds a certain type and I need to get a variable that holds the corresponding array class. The best I could come up with is this: Class arrayOfFooClass = java.lang....

SQLite , Derby vs file system

I m working on a Java desktop application that reads and writes from/to different files. I think a better solution would be to replace the file system by a SQLite database. How hard is it to migrate ...

热门标签