Since you haven t gotten any answers yet, I thought I would at least contribute some thoughts. I have used a python k-d tree module for quickly searching nearest neighbor points:
http://code.google.com/p/python-kdtree/downloads/detail?name=kdtree.py
It takes arbitrary point lengths as long as they are the same sizes.
我不敢肯定你将如何运用“进口”的权重,但这里只是关于如何利用树木模块至少使最接近的“人民”达到某个人的每一点的集思广益:
import numpy
from kdtree import KDTree
from itertools import chain
class PersonPoint(object):
def __init__(self, person, point, factor):
self.person = person
self.point = point
self.factor = factor
def __repr__(self):
return <%s: %s, %0.2f> % (self.person,
[ %0.2f % p for p in self.point], self.factor)
def __iter__(self):
return self.point
def __len__(self):
return len(self.point)
def __getitem__(self, i):
return self.point[i]
people = {}
for name in ( bill , john , mary , jenny , phil , george ):
factors = numpy.random.rand(6)
points = numpy.random.rand(6, 3).tolist()
people[name] = [PersonPoint(name, p, f) for p,f in zip(points, factors)]
bill_points = people[ bill ]
others = list(chain(*[people[name] for name in people if name != bill ]))
tree = KDTree.construct_from_data(others)
for point in bill_points:
# t=1 means only return the 1 closest.
# You could set it higher to return more.
print point, "=>", tree.query(point, t=1)[0]
成果:
<bill: [ 0.22 , 0.64 , 0.14 ], 0.07> =>
<phil: [ 0.23 , 0.54 , 0.11 ], 0.90>
<bill: [ 0.31 , 0.87 , 0.16 ], 0.88> =>
<phil: [ 0.36 , 0.80 , 0.14 ], 0.40>
<bill: [ 0.34 , 0.64 , 0.25 ], 0.65> =>
<jenny: [ 0.29 , 0.77 , 0.28 ], 0.40>
<bill: [ 0.24 , 0.90 , 0.23 ], 0.53> =>
<jenny: [ 0.29 , 0.77 , 0.28 ], 0.40>
<bill: [ 0.50 , 0.69 , 0.06 ], 0.68> =>
<phil: [ 0.36 , 0.80 , 0.14 ], 0.40>
<bill: [ 0.13 , 0.67 , 0.93 ], 0.54> =>
<jenny: [ 0.05 , 0.62 , 0.94 ], 0.84>
I figured with the result, you could look at the most frequent matched "person" or then consider the weights. Or maybe you can total up the important factors in the results and then take the highest rated one. That way, if mary only matched once but had like a 10 factor, and phil had 3 matched but only totaled to 5, mary might be more relevant?
我知道,你在建立指数方面有着更强有力的功能,但需要贯穿你收集的每一点。