理解校对: <>m> 源自<>和 多元性?之间的确切区别是什么?
或者这是否意味着同样的事情?
理解校对: <>m> 源自<>和 多元性?之间的确切区别是什么?
或者这是否意味着同样的事情?
第一,steming指的是减少一字 to的过程。 然而,这可能意味着若干不同的事情。 多数语言者区分至少两种方式:
移除
Removing bothgrammatical and derivational 吗? 衍生的吗morph是另一个词中与其衍生有关的词语的组成部分,例如,“工人”中的“工人”与如何从“工作”中产生(或可被视为衍生)。
因此,pluralization, 这是一个相当不寻常的术语,但显然是指去除一种复式的形态(如“计算机”末的“-”),是<>某种封顶部分,具体来说,是去除(但并非衍生物)吗?
In English, the morphology of nouns is largely limited to plural ("computers") and genitive (second case, "computer s"), hence as far as English is concerned, depluralization may be seen as (almost) synonymous with (grammatical) stemming, at least to the extent that stemming is applied to nouns, and, to some degree, adjectives, (which it is e.g. in the context of information retrieval). However, wherever verbs are considered, past tense, passive voice and other inflectional forms are subject to stemming (but not to depluralization).
此外,除英文外,其他语文的肿瘤甚至可能具有非常丰富的形态,包括形态学,如情况、政治等级或特殊种类的多元性(如双重)。 因此,分散化(如果你想要用这一术语的话)只能指整个遏制进程的很小一部分。
Another related term is lemmatization, which is often used synonymously with stemming. One distinction between the two that I found many people (including myself) to make is this:
<>Steming系指基于规则或基于机器的学习。 消除象牙齿一样一字(大多为终点)的技术
<>Lemmatization用于指一种相同的过程,但使用一种实际的dictionary处理高度不正常的形式(如复式“妇女”)
(并非每个人都会同意这一区别)。
它们并不相同。 有几个办法可以消除一个词,分散化是一个战略。
仅举一个速效的例子: stemm子可能将“儿童”变成“儿童”,或将“游戏”改为“系统”,而复数算法则不会。
Stemming is converting multiple words with the same root to one word. Ex. "cats", "catlike", "catty" to "cat"
Depluralization is converting plural words into singular. Ex. "cats" to "cat"
Additional info for stemming and algorithms http://en.wikipedia.org/wiki/Stemming#Algorithms
The Stanford NLP, demo d here, gives an output like this: Colorless/JJ green/JJ ideas/NNS sleep/VBP furiously/RB ./. What do the Part of Speech tags mean? I am unable to find an official list. Is it ...
I m using the Stanford NLP Parsing toolkit. Given a word in the lexicon, how can I find its frequency*? Or, given a frequency rank, how can I determine the corresponding word? *in the entire language,...
I am looking for an open source Natural Language Processing library for c/c++ and especially i am interested in Part of speech tagging.
How can I tell NLTK to treat the text in a particular language? Once in a while I write a specialized NLP routine to do POS tagging, tokenizing and etc. on a non-english (but still hindo-European) ...
I need to cluster some text documents and have been researching various options. It looks like LingPipe can cluster plain text without prior conversion (to vector space etc), but it s the only tool I ...
When do I use each ? Also...is the NLTK lemmatization dependent upon Parts of Speech? Wouldn t it be more accurate if it was?
Does anyone know of an off-the-shelf database that provides phonetic (kana) readings for Japanese words?
Do you know any frameworks that implement natural language rendering concept ? I ve found several NLP oriented frameworks like Anthelope or Open NLP but they have only parsers but not renderers or ...