我正在编写一个程序,它从文件中读取文本,并确定该文件的句子数、单词数和音节数。诀窍在于,它只能一次读取一个字符,然后处理它。这意味着它不能仅将整个文件存储在数组中。
因此,考虑到这一点,这是我的程序如何工作的:
while(character != EOF)
{
check if the character is a end-of-sentence marker (?:;.!)
check if the character is whitespace (
)
(must be a letter now)
check if the letter is a vowel
}
使用状态机的方法,每次循环通过时,某些触发器要么为1,要么为0,这会影响计数。我没有遇到数句子或单词的问题,但音节让我困扰。我使用的音节定义是任何元音或元音组合都算作1个音节,但是一个单独的结尾e不算作一个音节。
考虑到这一点,我已经创建了代码,使得
if character = A || E ... || o || u
if the last character wasnt a vowel then
set the flag for the letter being a vowel.
(so that next time through, it doesnt get counted)
and add one to the syllable count.
if the last character was a vowel, then dont change the flag and don t
add to the count.
Now the problem i have, is my count for a given text file, is very low. The given count is 57 syllables, 36 words, and 3 sentences. I get the sentences correct, same with the words, but my syllable count is only 35.
I also have it setup so that when the program reads a !:;.? or whitespace it will look at the last character read, and if that is an e, it will take one off the syllable count. This takes care of the e being at the end of a word not counting as a vowel.
因此,我知道我的方法肯定有问题,才会造成如此巨大的差异。我肯定是忘记了什么。
有人有一些建议吗?如果必要的话,我不想包含我的整个程序,但可以包含特定的块。
编辑:一些代码...
如果有句子结束标记,那么执行下一个if语句;如果有空格,则执行else if语句;最后一个else语句表示只有能形成单词的字母才会在这个块中出现。这是唯一一个对每个音节计数产生影响的代码块。
if(chrctr == A || chrctr == E || chrctr == I || chrctr == O || chrctr == U || chrctr == a || chrctr == e || chrctr == i || chrctr == o || chrctr == u )
{
if(chrctr == E || chrctr == e )
{
isE = 1;
}
else
{
isE = 0;
}
if(skipSylb != 1)
{
endSylb = 1;
skipSylb = 1;
}
else
{
endSylb = 0;
skipSylb = 1;
}
}
else
{
endSylb = 0;
skipSylb = 0;
}
因此,简单解释一下... 如果endSylb等于1,程序稍后将添加一个音节计数。skipSylb用于标记上一个字符是否也是一个音节。如果skipSylb = 1,则这是一个元音字母块,我们只想将计数器加1。现在我有一个isE变量,它只是告诉程序下一次循环上一个字母是E。这意味着,下一次进入while循环时,如果它是句子的结尾,或者是空格,并且上一个字母是E(因此isE = 1),那么我们就会多添加一个音节。
希望这有所帮助。
Since the value is actually lower then what it should be, i thought perhaps the statements where i minus from the count are important too. I use this if statement to decide when to minus from the count:
if(isE == 1)
{
countSylb --;
}
This statement happens when the character is whitespace, or an end of sentence character. I can t think of anything else relevant, but i still feel like im not including enough. Oh well, let me know if something is unclear.