English 中文(简体)
根据某些加权标准计算类似物体的方法
原标题:The approach to calculating similar objects based on certain weighted criteria

我有一个网站有多个项目目标。 每个项目都有(例如):

  • multiple tags
  • multiple categories
  • a size
  • multiple types
  • etc.

I would like to write a method to grab all similar projects based on the above criteria. I can easily retrieve similar projects for each of the above singularly (i.e. projects of a similar size or projects that share a category etc.) but I would like it to be more intelligent then just choosing projects that either have all the above in common, or projects that have at least one of the above in common.

理想的情况是,我要权衡每一项标准,即一个具有共同点的项目,与规模相近的项目相比,并不相似。 一个具有两个共同点的项目比一个具有共同点的项目更为相似。

我可以采取什么办法(实际和数学)来做到这一点?

最佳回答
问题回答

问题在于,显然有很多办法解决这一问题。

首先,界定了每种特性的相似性(对应性、相似性、描述相似性......)

然后,设法使所有这些相似之处实现正常化,以便采用共同的比额表,例如0至1,0为最相似,数值也具有类似的分布。

其次,对每种特征进行权衡。 例如,标签相似性比描述相似性更为重要。

最后,对单个类似情况的加权总额作一比较。

有一些方法,因为你显然可以任意加权,对已经存在的单一目标相似之处有各种选择,具体做法是使个人价值观正常化。 因此。

有学习权重的方法。 见ensemble methods。 然而,为了了解你需要就什么是好的结果而不是什么获得用户的投入。 是否有这样的培训数据?

  1. Start with a value of 100 in each category.
  2. Apply penalties. Like, -1 for each kB difference in size, or -2 for each tag not found in the other project. You end up with a value of 0..100 in each category.
  3. Multiply each category s value with the "weight" of the category (i.e., similarity in size is multiplied with 1, similarity in tags with 3, similarity in types with 2).
  4. Add up the weighted values.
  5. Divide by the sum of weight factors (in my example, 1 + 3 + 2 = 6) to get an overall similarity of 0..100.

能否在最初的O(n^2)下减少项目比较(即每个项目相互比较)在很大程度上取决于具体情况。 这可能是你的软件的真正精髓,或者如果<条码><<>>>>/代码>低,则可能根本没有必要。





相关问题
Maths in LaTex table of contents

I am trying to add a table of contents for my LaTex document. The issue I am having is that this line: subsubsection{The expectation of (X^2)} Causes an error in the file that contains the ...

Math Overflow -- Handling Large Numbers

I am having a problem handling large numbers. I need to calculate the log of a very large number. The number is the product of a series of numbers. For example: log(2x3x66x435x444) though my actual ...

Radial plotting algorithm

I have to write an algorithm in AS3.0 that plots the location of points radially. I d like to input a radius and an angle at which the point should be placed. Obviously I remember from geometry ...

Subsequent weighted algorithm for comparison

I have two rows of numbers ... 1) 2 2 1 0 0 1 2) 1.5 1 0 .5 1 2 Each column is compared to each other. Lower values are better. For example Column 1, row 2 s value (1.5) is more ...

热门标签