English 中文(简体)
C#: transcribe WAV file to text (speech-to-text) with System.Speech namespaces
原标题:

How do you use the .NET speech namespace classes to convert audio in a WAV file to textual form which I can display on the screen or save to file?

I am looking for some tutorial samples.

UPDATE

Found a code sample here. But when I tried it it gives incorrect results. Below is the vb code sample I ve adopted. (Actually I don t mind the lang as long as its either vb/c#...). It is not giving me proper results. I assume if we put the right grammar - i.e. the words we expect in the recording - we should get the textual output of that. First I ve tried with sample words that are in the call. It sometimes printed only that (one) word and nothing else. Then I tried words which we totally do not expect in the recording...Unfortunately it printed out that too... :(

Imports System
Imports System.Speech.Recognition

Public Class Form1

    Dim WithEvents sre As SpeechRecognitionEngine

    Private Sub btnLiterate_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnLiterate.Click
        If TextBox1.Text.Trim.Length = 0 Then Exit Sub
        sre.SetInputToWaveFile(TextBox1.Text)
        Dim r As RecognitionResult
        r = sre.Recognize()
        If r Is Nothing Then
            TextBox2.Text = "Could not fetch result"
            Return
        End If
        TextBox2.Text = r.Text
    End Sub

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        TextBox1.Text = String.Empty
        Dim dr As DialogResult
        dr = OpenFileDialog1.ShowDialog()
        If dr = Windows.Forms.DialogResult.OK Then
            If Not OpenFileDialog1.FileName.Contains("wav") Then
                MessageBox.Show("Incorrect file")
            Else
                TextBox1.Text = OpenFileDialog1.FileName
            End If
        End If
    End Sub

    Public Sub New()

          This call is required by the Windows Form Designer.
        InitializeComponent()

        sre = New SpeechRecognitionEngine()

    End Sub

    Private Sub sre_LoadGrammarCompleted(ByVal sender As Object, ByVal e As System.Speech.Recognition.LoadGrammarCompletedEventArgs) Handles sre.LoadGrammarCompleted

    End Sub

    Private Sub sre_SpeechHypothesized(ByVal sender As Object, ByVal e As System.Speech.Recognition.SpeechHypothesizedEventArgs) Handles sre.SpeechHypothesized
        System.Diagnostics.Debug.Print(e.Result.Text)
    End Sub

    Private Sub sre_SpeechRecognitionRejected(ByVal sender As Object, ByVal e As System.Speech.Recognition.SpeechRecognitionRejectedEventArgs) Handles sre.SpeechRecognitionRejected
        System.Diagnostics.Debug.Print("Rejected: " & e.Result.Text)
    End Sub

    Private Sub sre_SpeechRecognized(ByVal sender As Object, ByVal e As System.Speech.Recognition.SpeechRecognizedEventArgs) Handles sre.SpeechRecognized
        System.Diagnostics.Debug.Print(e.Result.Text)
    End Sub

    Private Sub Form1_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
        Dim words As String() = New String() {"triskaidekaphobia"}
        Dim c As New Choices(words)
        Dim grmb As New GrammarBuilder(c)
        Dim grm As Grammar = New Grammar(grmb)
        sre.LoadGrammar(grm)
    End Sub

End Class

UPDATE(after Nov 28th)

Found a way to load a default grammar. It goes something like this:

sre.LoadGrammar(New DictationGrammar)

There are still problems here. The recognition is not exact. The output is rubbish. For a 6min file it gives probably 5-6 words of text totally irrelevant to the voice file.

最佳回答

The classes in System.Speech are for text to speech (primarily an acessibility feature).

You are looking for voice recognition. There is the System.Speech.Recognition namespace available since .Net 3.0. It uses the Windows Desktop Speech engine. This might get you started, but I guess there are better engines out there.

Voice recognition is very complicated and hard to do right, there are also some commercial products available.

问题回答

I realize this is an old question, but there is better information available in later questions and answers. For example see What is the best option for transcribing speech-to-text in a asp.net web app?

Instead of calling SetInputToDefaultAudioDevice() you can call SetInputToWaveFile() to read from an audio file.

The desktop recognition engine that comes in Windows Vista and Windows 7 includes a dictation grammar as shown in the referenced answer.

You should use the SpeechRecognitionEngine. To use a wave file, call SetInputToWaveFile. I wish I could help you more, but I m no expert.

Oh, and if your word is really triskaidekaphobia, I don t think even a human speech recognition engine would recognize that...





相关问题
Anyone feel like passing it forward?

I m the only developer in my company, and am getting along well as an autodidact, but I know I m missing out on the education one gets from working with and having code reviewed by more senior devs. ...

NSArray s, Primitive types and Boxing Oh My!

I m pretty new to the Objective-C world and I have a long history with .net/C# so naturally I m inclined to use my C# wits. Now here s the question: I feel really inclined to create some type of ...

C# Marshal / Pinvoke CBitmap?

I cannot figure out how to marshal a C++ CBitmap to a C# Bitmap or Image class. My import looks like this: [DllImport(@"test.dll", CharSet = CharSet.Unicode)] public static extern IntPtr ...

How to Use Ghostscript DLL to convert PDF to PDF/A

How to user GhostScript DLL to convert PDF to PDF/A. I know I kind of have to call the exported function of gsdll32.dll whose name is gsapi_init_with_args, but how do i pass the right arguments? BTW, ...

Linqy no matchy

Maybe it s something I m doing wrong. I m just learning Linq because I m bored. And so far so good. I made a little program and it basically just outputs all matches (foreach) into a label control. ...

热门标签