English 中文(简体)
逐字符计算音节[C]
原标题:Counting Syllables one char at a time [C]

我正在编写一个程序,它从文件中读取文本,并确定该文件的句子数、单词数和音节数。诀窍在于,它只能一次读取一个字符,然后处理它。这意味着它不能仅将整个文件存储在数组中。

因此,考虑到这一点,这是我的程序如何工作的:

while(character != EOF)
{
    check if the character is a end-of-sentence marker (?:;.!)
    check if the character is whitespace (    	 
)
    (must be a letter now)
    check if the letter is a vowel
}

使用状态机的方法,每次循环通过时,某些触发器要么为1,要么为0,这会影响计数。我没有遇到数句子或单词的问题,但音节让我困扰。我使用的音节定义是任何元音或元音组合都算作1个音节,但是一个单独的结尾e不算作一个音节。

考虑到这一点,我已经创建了代码,使得

if character =  A  ||  E  ... ||  o  ||  u 
    if the last character wasnt a vowel then
    set the flag for the letter being a vowel.
    (so that next time through, it doesnt get counted)
    and add one to the syllable count.
    if the last character was a vowel, then dont change the flag and don t
    add to the count. 

Now the problem i have, is my count for a given text file, is very low. The given count is 57 syllables, 36 words, and 3 sentences. I get the sentences correct, same with the words, but my syllable count is only 35.

I also have it setup so that when the program reads a !:;.? or whitespace it will look at the last character read, and if that is an e, it will take one off the syllable count. This takes care of the e being at the end of a word not counting as a vowel.

因此,我知道我的方法肯定有问题,才会造成如此巨大的差异。我肯定是忘记了什么。

有人有一些建议吗?如果必要的话,我不想包含我的整个程序,但可以包含特定的块。

编辑:一些代码...

如果有句子结束标记,那么执行下一个if语句;如果有空格,则执行else if语句;最后一个else语句表示只有能形成单词的字母才会在这个块中出现。这是唯一一个对每个音节计数产生影响的代码块。

if(chrctr ==  A  || chrctr ==  E  || chrctr ==  I  || chrctr ==  O  || chrctr ==  U  || chrctr ==  a  || chrctr ==  e  || chrctr ==  i  || chrctr ==  o   || chrctr ==  u )
        {
            if(chrctr ==  E  || chrctr ==  e )
            {
                isE = 1;
            }
            else
            {
                isE = 0;
            }
            if(skipSylb != 1)
            {
                endSylb = 1;
                skipSylb = 1;
            }
            else
            {
                endSylb = 0;
                skipSylb = 1;
            }
        }
        else
        {
            endSylb = 0;
            skipSylb = 0;

        }

因此,简单解释一下... 如果endSylb等于1,程序稍后将添加一个音节计数。skipSylb用于标记上一个字符是否也是一个音节。如果skipSylb = 1,则这是一个元音字母块,我们只想将计数器加1。现在我有一个isE变量,它只是告诉程序下一次循环上一个字母是E。这意味着,下一次进入while循环时,如果它是句子的结尾,或者是空格,并且上一个字母是E(因此isE = 1),那么我们就会多添加一个音节。

希望这有所帮助。

Since the value is actually lower then what it should be, i thought perhaps the statements where i minus from the count are important too. I use this if statement to decide when to minus from the count:

 if(isE == 1)
       {
           countSylb --;
       } 

This statement happens when the character is whitespace, or an end of sentence character. I can t think of anything else relevant, but i still feel like im not including enough. Oh well, let me know if something is unclear.

问题回答

我还设置了这样一个功能,当程序读取到!:;.?或空格符时,它将查看上一个读取的字符,如果那个字符是e,则将音节计数减1。

This sounds wrong. What about words like "die" and "see"? Obviously you can only decrement the count if the word counted for more than one syllable.

如果结尾的e不是元音组成的一部分,那么减少可能足够了。

如果这不起作用:也许您在阅读辅音后没有清除元音标志?从您的代码中无法判断。

确实可以帮助你减少产出。 让方案告诉你,它正在做些什么:

读一个元音字母:e。

不算英文元音字母e因为[...]

You need a Finite State Machine


从某种意义上说,每个程序都是一个状态机,但典型的编程语言中,“状态机”指的是一个严格组织的循环,它执行类似于:的操作。

while (1) {
  switch(current_state) {
    case STATE_IDLE:
      if (evaluate some condition)
        next_state = STATE_THIS;
      else
        next_state = STATE_THAT;
      break
    case STATE_THIS:
      // some other logic here
      break;
    case STATE_THAT:
      // yet more
      break;
  }
  state = next_state;
}

是的,您可以使用常规的意大利面代码来解决这种问题。尽管不再见到使用文字跳转的遗留意大利面代码,但有一种思想流派抵制在单个函数中分组大量的条件和嵌套条件,以最小化圆周复杂度。换个说法,一堆条件嵌套在一起的东西就像意大利面代码的现代版本。

通过至少将控制流程组织成一个状态机,您可以将某些逻辑压缩到一个单一平面中,这样操作就更容易可视化并进行单独更改。虽然这个结构很少是最短的表达式,但至少易于修改和逐步改变。

看了你的代码,我怀疑有些逻辑在过多的代码量中丢失了。你的主要代码片段看起来相当于这样:

chrctr = tolower(chrctr);

if (strchr(chrctr, "aeiou")) {
    isE = (chrctr ==  e );
    endSylb = !skipSylb;
    skipSylb = 1; // May not be you want, but it s what you have.
}
else {
    skipSylb = endSylb = 0;
}

就我个人而言,我认为试图通过算法来计算音节几乎是不可行的,但是如果你真的想要,我建议看看Porter词干提取器的步骤,以便在某种程度上有意义地分割英语单词。它的目的是去掉后缀,但我认为它解决的问题足够相似,可能会提供一点灵感。





相关问题
Fastest method for running a binary search on a file in C?

For example, let s say I want to find a particular word or number in a file. The contents are in sorted order (obviously). Since I want to run a binary search on the file, it seems like a real waste ...

Print possible strings created from a Number

Given a 10 digit Telephone Number, we have to print all possible strings created from that. The mapping of the numbers is the one as exactly on a phone s keypad. i.e. for 1,0-> No Letter for 2->...

Tips for debugging a made-for-linux application on windows?

I m trying to find the source of a bug I have found in an open-source application. I have managed to get a build up and running on my Windows machine, but I m having trouble finding the spot in the ...

Trying to split by two delimiters and it doesn t work - C

I wrote below code to readin line by line from stdin ex. city=Boston;city=New York;city=Chicago and then split each line by ; delimiter and print each record. Then in yet another loop I try to ...

Good, free, easy-to-use C graphics libraries? [closed]

I was wondering if there were any good free graphics libraries for C that are easy to use? It s for plotting 2d and 3d graphs and then saving to a file. It s on a Linux system and there s no gnuplot ...

Encoding, decoding an integer to a char array

Please note that this is not homework and i did search before starting this new thread. I got Store an int in a char array? I was looking for an answer but didn t get any satisfactory answer in the ...

热门标签