English 中文(简体)
比较同类数字的算法?
原标题:algorithm to compare lists of numbers for similarity?

如果这是一个完全不明确的问题,但我想试图在一份清单中找到类似的价值观。 实际上,更具体地说,我希望看到,我是否能够把这些项目分得分。

我在晚上知道,我只能拿一个清单,而是看一看它是否相同,如果它们不是完全一样,而是具有某种相似的价值观(或不是)。

例如:

#Batch one 
[1, 10, 20]
[5, 15, 10]
[70, 19, 15]
[50, 40, 20]


#Batch two 
[46, 19, 8]
[6, 14, 8]
[2, 11, 44]

我希望通过两批相互的相似之处来分立。 我认为,我只能增加所有数字,然后按总价值加以比较,但我认为这并不奏效,因为[5,6,1000][600,200,211]似乎相似。 例如,[5、15、10]和[6、14、8]得分最高。

我认为,要区分每个数值,看看一个百分点的差别,但如果清单具有许多变数(我最终可能有数千个清单,每个变量超过800个),那么这似乎确实是昂贵的。

任何建议?

最佳回答
问题回答
a = [1, 10, 20]
b = [5, 15, 10]
c = [70, 19, 15]
d = [50, 40, 20]

def sim(seqA, seqB):
    return sum([abs(a - b) for (a, b) in zip(seqA, seqB)])


print sim(a, a) # => 0
print sim(a, b) # => 19
print sim(a, c) # => 83
print sim(a, d) # => 79

Lower number means more similar. 0 means identical.

如果我正确理解你的话,你基本上想看到你有哪组别?

因此,如果你认为你的数据是3D的一组点,那么你会再次设法找到每个组群的分布?

(In other words you want to compare how internally similar the two batches are?)

在该案中,考虑诸如以下一些内容(利用 n加速:

import numpy as np

def spread(group):
    return group.var(axis=0).sum()

group1 = np.array([[1, 10, 20],
                   [5, 15, 10],
                   [70, 19, 15],
                   [50, 40, 20]], dtype=np.float)
group2 = np.array([[46, 19, 8],
                   [6, 14, 8],
                   [2, 11, 44]], dtype=np.float)

print spread(group1), spread(group2)

因此,在这种情况下,第2组最接近

If, instead, you re interested in finding how "close" the two groups are to each other, then you could compare the distance between their centers

legs = group1.mean(axis=0) - group2.mean(axis=0)
distance = np.sqrt(np.sum(legs**2))

Or are you wanting to find the two "points" within each group that are the closest? (In which case you d use a distance matrix (or a more efficient algorithm for more points...)).

显而易见的解决办法已经在这里。 基本上,这相当于计算差异。

既然你提到百分比......(1,2,3]和[101,103,105],那是你们喜欢做最后回答吗? 如果回答首先,那么就永远不会忘记。 如果是第二点,你就不得不将差异与平均值实现正常化。

解决办法是:(SquareMean - Mean^2)/Mean^2,其中Mean = (a^2+b^2+c^2)/3,Mean = (a+b+c)/3。

我不知道如何,但我想到的是试图使用标准偏差,因为(理论上)类似价值观也有类似的偏离?

In this case [5, 15, 10] gets a standard deviation of 5 and [6, 14, 18] gets 6.1101





相关问题
How to add/merge several Big O s into one

If I have an algorithm which is comprised of (let s say) three sub-algorithms, all with different O() characteristics, e.g.: algorithm A: O(n) algorithm B: O(log(n)) algorithm C: O(n log(n)) How do ...

Grokking Timsort

There s a (relatively) new sort on the block called Timsort. It s been used as Python s list.sort, and is now going to be the new Array.sort in Java 7. There s some documentation and a tiny Wikipedia ...

Manually implementing high performance algorithms in .NET

As a learning experience I recently tried implementing Quicksort with 3 way partitioning in C#. Apart from needing to add an extra range check on the left/right variables before the recursive call, ...

Print possible strings created from a Number

Given a 10 digit Telephone Number, we have to print all possible strings created from that. The mapping of the numbers is the one as exactly on a phone s keypad. i.e. for 1,0-> No Letter for 2->...

Enumerating All Minimal Directed Cycles Of A Directed Graph

I have a directed graph and my problem is to enumerate all the minimal (cycles that cannot be constructed as the union of other cycles) directed cycles of this graph. This is different from what the ...

Quick padding of a string in Delphi

I was trying to speed up a certain routine in an application, and my profiler, AQTime, identified one method in particular as a bottleneck. The method has been with us for years, and is part of a "...

热门标签