我正在寻找一个可以对德语单词进行词形分析的图书馆,即将任何单词转换为其词根形式并提供有关分析单词的元信息。
例如:
gegessen -> essen
wurde [...] gefasst -> fassen
Häuser -> Haus
Hunde -> Hund
我的愿望清单:
- It has to work with both nouns and verbs.
- I m aware that this is a very hard task given the complexity of the German language, so I m also looking for libaries which provide only approximations or may only be 80% accurate.
- I d prefer libraries which don t work with dictionaries, but again I m open to compromise given the cirumstances.
- I d also prefer C/C++/Delphi Windows libraries, because that would make them easier to integrate but .NET, Java, ... will also do.
- It has to be a free library. (L)GPL, MPL, ...
EDIT: I m aware that there is no way to perform a morphological analysis without any dictionary at all, because of the irregular words. When I say, I prefer a library without a dictionary I mean those full blown dictionaries which map each and every word:
arbeite -> arbeiten
arbeitest -> arbeiten
arbeitet -> arbeiten
arbeitete -> arbeiten
arbeitetest -> arbeiten
arbeiteten -> arbeiten
arbeitetet -> arbeiten
gearbeitet -> arbeiten
arbeite -> arbeiten
...
那些词典存在几个缺点,包括巨大的尺寸和无法处理未知单词。
当然,所有的异常都只能用词典来处理:
esse -> essen
isst -> essen
eßt -> essen
aß -> essen
aßt -> essen
aßen -> essen
...
我的脑袋现在转得飞快 :)