Question

我有3个多点重要因素。

每个用户有六点。例如,理查人有6点:(22 44,55人)是他的第一个重要因素,有3点(10,0)是他的第二大病媒,其重要因素为2.8倍,其第六点(100,300,200人),重要因素为0.4。

我想做的是,找到与查理最相似的人,而不要通过其他每个人。基本将这一功能降至每个用户(即,将用户的正确六点与查理相匹配):

pythagoras(point, point2) * max(importance_factor, importance_factor2) * (abs(importance_factor - importance_factor2) + 1)

之后,通过选择成本最低的用户,发现用户最类似于查理。我撰写了该法典的 way路(做了很多的路程),但我想找到一种办法,妥善处理有多个要点和重要因素的事实。

我开始研究生育间隔指数,但我不认为他们会工作,因为我有多个要点,但也许我可以把要点推向一个更高的层面? 因此,在3个层面,我可以有1个层面? 然而,这个因素仍然无法处理,但比任何因素都好。

不幸的是,我可以在这里把我们的矢量和宇宙引向来,因为(1,1,1)和(400,400,400)是“每个钟”与事情相反的。

任何想法?

Answer 1

Since you haven t gotten any answers yet, I thought I would at least contribute some thoughts. I have used a python k-d tree module for quickly searching nearest neighbor points:
http://code.google.com/p/python-kdtree/downloads/detail?name=kdtree.py
It takes arbitrary point lengths as long as they are the same sizes.

我不敢肯定你将如何运用“进口”的权重,但这里只是关于如何利用树木模块至少使最接近的“人民”达到某个人的每一点的集思广益:

import numpy
from kdtree import KDTree
from itertools import chain

class PersonPoint(object):

    def __init__(self, person, point, factor):
        self.person = person 
        self.point = point 
        self.factor = factor 

    def __repr__(self):
        return  <%s: %s, %0.2f>  % (self.person, 
            [ %0.2f  % p for p in self.point], self.factor) 

    def __iter__(self):
        return self.point

    def __len__(self):
        return len(self.point)

    def __getitem__(self, i):
        return self.point[i]


people = {}
for name in ( bill ,  john ,  mary ,  jenny ,  phil ,  george ):
    factors = numpy.random.rand(6)
    points = numpy.random.rand(6, 3).tolist()
    people[name] = [PersonPoint(name, p, f) for p,f in zip(points, factors)]

bill_points = people[ bill ]
others = list(chain(*[people[name] for name in people if name !=  bill ]))

tree = KDTree.construct_from_data(others)

for point in bill_points:
    # t=1 means only return the 1 closest.
    # You could set it higher to return more.
    print point, "=>", tree.query(point, t=1)[0]

成果:

<bill: [ 0.22 ,  0.64 ,  0.14 ], 0.07> => 
    <phil: [ 0.23 ,  0.54 ,  0.11 ], 0.90>

<bill: [ 0.31 ,  0.87 ,  0.16 ], 0.88> => 
    <phil: [ 0.36 ,  0.80 ,  0.14 ], 0.40>

<bill: [ 0.34 ,  0.64 ,  0.25 ], 0.65> => 
    <jenny: [ 0.29 ,  0.77 ,  0.28 ], 0.40>

<bill: [ 0.24 ,  0.90 ,  0.23 ], 0.53> => 
    <jenny: [ 0.29 ,  0.77 ,  0.28 ], 0.40>

<bill: [ 0.50 ,  0.69 ,  0.06 ], 0.68> => 
    <phil: [ 0.36 ,  0.80 ,  0.14 ], 0.40>

<bill: [ 0.13 ,  0.67 ,  0.93 ], 0.54> => 
    <jenny: [ 0.05 ,  0.62 ,  0.94 ], 0.84>

I figured with the result, you could look at the most frequent matched "person" or then consider the weights. Or maybe you can total up the important factors in the results and then take the highest rated one. That way, if mary only matched once but had like a 10 factor, and phil had 3 matched but only totaled to 5, mary might be more relevant?

我知道,你在建立指数方面有着更强有力的功能,但需要贯穿你收集的每一点。

友情链接