Question

理解校对: <>m> 源自<>和 多元性?之间的确切区别是什么?

或者这是否意味着同样的事情?

Answer 1

第一,steming指的是减少一字 to的过程。然而,这可能意味着若干不同的事情。多数语言者区分至少两种方式:

移除grammatical,但derivational 吗? 半年期是该词中与其在某一句话中的文法作用相关的部分,例如 /m>,case,gender,tense,ectasp。

Removing bothgrammatical and derivational 吗? 衍生的吗morph是另一个词中与其衍生有关的词语的组成部分,例如,“工人”中的“工人”与如何从“工作”中产生(或可被视为衍生)。

因此,pluralization, 这是一个相当不寻常的术语,但显然是指去除一种复式的形态(如“计算机”末的“-”),是<>某种封顶部分,具体来说,是去除(但并非衍生物)吗?

In English, the morphology of nouns is largely limited to plural ("computers") and genitive (second case, "computer s"), hence as far as English is concerned, depluralization may be seen as (almost) synonymous with (grammatical) stemming, at least to the extent that stemming is applied to nouns, and, to some degree, adjectives, (which it is e.g. in the context of information retrieval). However, wherever verbs are considered, past tense, passive voice and other inflectional forms are subject to stemming (but not to depluralization).

此外,除英文外,其他语文的肿瘤甚至可能具有非常丰富的形态,包括形态学,如情况、政治等级或特殊种类的多元性(如双重)。因此,分散化(如果你想要用这一术语的话)只能指整个遏制进程的很小一部分。

Another related term is lemmatization, which is often used synonymously with stemming. One distinction between the two that I found many people (including myself) to make is this:

<>Steming系指基于规则或基于机器的学习。消除象牙齿一样一字(大多为终点)的技术

<>Lemmatization用于指一种相同的过程,但使用一种实际的dictionary处理高度不正常的形式(如复式“妇女”)

(并非每个人都会同意这一区别)。

Answer 2

它们并不相同。有几个办法可以消除一个词,分散化是一个战略。

仅举一个速效的例子: stemm子可能将“儿童”变成“儿童”,或将“游戏”改为“系统”,而复数算法则不会。

Answer 3

Stemming is converting multiple words with the same root to one word. Ex. "cats", "catlike", "catty" to "cat"

Depluralization is converting plural words into singular. Ex. "cats" to "cat"

Additional info for stemming and algorithms http://en.wikipedia.org/wiki/Stemming#Algorithms

友情链接