Question

我正在编写一个程序，它从文件中读取文本，并确定该文件的句子数、单词数和音节数。诀窍在于，它只能一次读取一个字符，然后处理它。这意味着它不能仅将整个文件存储在数组中。

因此，考虑到这一点，这是我的程序如何工作的：

while(character != EOF)
{
    check if the character is a end-of-sentence marker (?:;.!)
    check if the character is whitespace (    	 
)
    (must be a letter now)
    check if the letter is a vowel
}

使用状态机的方法，每次循环通过时，某些触发器要么为1，要么为0，这会影响计数。我没有遇到数句子或单词的问题，但音节让我困扰。我使用的音节定义是任何元音或元音组合都算作1个音节，但是一个单独的结尾e不算作一个音节。

考虑到这一点，我已经创建了代码，使得

if character =  A  ||  E  ... ||  o  ||  u 
    if the last character wasnt a vowel then
    set the flag for the letter being a vowel.
    (so that next time through, it doesnt get counted)
    and add one to the syllable count.
    if the last character was a vowel, then dont change the flag and don t
    add to the count.

Now the problem i have, is my count for a given text file, is very low. The given count is 57 syllables, 36 words, and 3 sentences. I get the sentences correct, same with the words, but my syllable count is only 35.

I also have it setup so that when the program reads a !:;.? or whitespace it will look at the last character read, and if that is an e, it will take one off the syllable count. This takes care of the e being at the end of a word not counting as a vowel.

因此，我知道我的方法肯定有问题，才会造成如此巨大的差异。我肯定是忘记了什么。

有人有一些建议吗？如果必要的话，我不想包含我的整个程序，但可以包含特定的块。

编辑：一些代码...

如果有句子结束标记，那么执行下一个if语句；如果有空格，则执行else if语句；最后一个else语句表示只有能形成单词的字母才会在这个块中出现。这是唯一一个对每个音节计数产生影响的代码块。

if(chrctr ==  A  || chrctr ==  E  || chrctr ==  I  || chrctr ==  O  || chrctr ==  U  || chrctr ==  a  || chrctr ==  e  || chrctr ==  i  || chrctr ==  o   || chrctr ==  u )
        {
            if(chrctr ==  E  || chrctr ==  e )
            {
                isE = 1;
            }
            else
            {
                isE = 0;
            }
            if(skipSylb != 1)
            {
                endSylb = 1;
                skipSylb = 1;
            }
            else
            {
                endSylb = 0;
                skipSylb = 1;
            }
        }
        else
        {
            endSylb = 0;
            skipSylb = 0;

        }

因此，简单解释一下... 如果endSylb等于1，程序稍后将添加一个音节计数。skipSylb用于标记上一个字符是否也是一个音节。如果skipSylb = 1，则这是一个元音字母块，我们只想将计数器加1。现在我有一个isE变量，它只是告诉程序下一次循环上一个字母是E。这意味着，下一次进入while循环时，如果它是句子的结尾，或者是空格，并且上一个字母是E（因此isE = 1），那么我们就会多添加一个音节。

希望这有所帮助。

Since the value is actually lower then what it should be, i thought perhaps the statements where i minus from the count are important too. I use this if statement to decide when to minus from the count:

 if(isE == 1)
       {
           countSylb --;
       }

This statement happens when the character is whitespace, or an end of sentence character. I can t think of anything else relevant, but i still feel like im not including enough. Oh well, let me know if something is unclear.

Answer 1

我还设置了这样一个功能，当程序读取到！：；.？或空格符时，它将查看上一个读取的字符，如果那个字符是e，则将音节计数减1。

This sounds wrong. What about words like "die" and "see"? Obviously you can only decrement the count if the word counted for more than one syllable.

如果结尾的e不是元音组成的一部分，那么减少可能足够了。

如果这不起作用：也许您在阅读辅音后没有清除元音标志？从您的代码中无法判断。

确实可以帮助你减少产出。让方案告诉你,它正在做些什么:

读一个元音字母：e。

不算英文元音字母e因为[...]

Answer 2

You need a Finite State Machine

从某种意义上说，每个程序都是一个状态机，但典型的编程语言中，“状态机”指的是一个严格组织的循环，它执行类似于：的操作。

while (1) {
  switch(current_state) {
    case STATE_IDLE:
      if (evaluate some condition)
        next_state = STATE_THIS;
      else
        next_state = STATE_THAT;
      break
    case STATE_THIS:
      // some other logic here
      break;
    case STATE_THAT:
      // yet more
      break;
  }
  state = next_state;
}

是的，您可以使用常规的意大利面代码来解决这种问题。尽管不再见到使用文字跳转的遗留意大利面代码，但有一种思想流派抵制在单个函数中分组大量的条件和嵌套条件，以最小化圆周复杂度。换个说法，一堆条件嵌套在一起的东西就像意大利面代码的现代版本。

通过至少将控制流程组织成一个状态机，您可以将某些逻辑压缩到一个单一平面中，这样操作就更容易可视化并进行单独更改。虽然这个结构很少是最短的表达式，但至少易于修改和逐步改变。

Answer 3

看了你的代码，我怀疑有些逻辑在过多的代码量中丢失了。你的主要代码片段看起来相当于这样：

chrctr = tolower(chrctr);

if (strchr(chrctr, "aeiou")) {
    isE = (chrctr ==  e );
    endSylb = !skipSylb;
    skipSylb = 1; // May not be you want, but it s what you have.
}
else {
    skipSylb = endSylb = 0;
}

就我个人而言，我认为试图通过算法来计算音节几乎是不可行的，但是如果你真的想要，我建议看看Porter词干提取器的步骤，以便在某种程度上有意义地分割英语单词。它的目的是去掉后缀，但我认为它解决的问题足够相似，可能会提供一点灵感。

You need a Finite State Machine

友情链接