English 中文(简体)
关于如何设计数据结构的咨询意见
原标题:Advice on how to design a datastructure

I have a file from which I am reading the data. I need advice on how to design the data structure which does the following: So, the data is of form

id_1::id_2::similiarity_score

现在,虽然数据形式如此,但也意味着:

id_2::id_1::same_similiarity_Score

So, what I want is a datastructure which when I use in program. So lets say I want to use this data in order to find which two items are similar

object.maxSimiliarity(object_id_1)
returns object_id_2 # has max score

但是,该物体_id_1也可在数据库中产品_id_2栏......

在数据库中,可以有以下两种形式:

 object_id_1:: object_id_2::score
 or object_id2::object_id_1::score

因此,我不想以这样的方式设计这种数据结构。

k_1, k_2:: value <--> k_2,k_1::value
最佳回答

It seems to me that you could use the scores to build lists of best to worst matches:

d = {
      id1 :  [id_best_match_to_id1, id_next_best_match_to_id1, ..., id_worst_match_to_id1],
      id2 :  [id_best_match_to_id2, id_next_best_match_to_id2, ..., id_worst_match_to_id2],
     ...
}

如果需要保留相似的分数,则采用<代码>(id_best_match_to_id1, 相似性_score_to_id1)的标记清单。

我看不出利用这种相似性的方法,即:<条码>模拟(x,y)=模拟(y,x)<代码>。

问题回答

这类事情的一般trick计是找到一种通俗化,这一功能将特定阶层的所有成员描绘成同一个目标。 在此情况下,您可以通过对头两个部分进行分类来实现:B:A:核心到A:B:核心,而A:B:核心。

The data look very much like nodes and edges of a weighted graph. If a is similar to b with a score 5.0, and similar to c with a score 1.0, you might visualise it thus:

                                    a
                                   / 
                                  /   
                                5.0   1.0
                                /       
                               b         c

Networkx is a python lib that provides ready-made graph objects and algorithms. Loading up your data into a weighted multigraph (that is, it supports multiple connections between nodes A--B and B--A is trivial. After that, getting the most similar object given an object id is a case of finding the node, finding it s most weighted edge and returning the node at the end of it.

import networkx as nx

## Test data
data = """
a::b::2
b::a::3
a::c::5
b::e::1
"""
rows = (row.split( :: ) for row in data.split())


class Similarity(object):
    def __init__(self, data):
        self.g = nx.MultiGraph()
        self.load(data)

    def load(self, data):
        ## Turn the row into data suitable for networkx graph
        rows = ((row[0], row[1], float(row[2])) for row in data)
        self.g.add_weighted_edges_from(rows)

    def most_similar(self, obj_id):
        ## Get edges from obj_id node
        edges = self.g.edges_iter(obj_id, data=True)
        ## Sort by weight, get first, get joined node
        return sorted([(i[0], i[1], i[2].get( weight , 0)) for i in edges])[-1][1]


sc = Similarity(rows)
sc.most_similar( a ) ##  c 
## Add some more data linking a --> f with a high score
sc.load([( a ,  f , 10)])
sc.most_similar( a ) ##  f 




相关问题
How to add/merge several Big O s into one

If I have an algorithm which is comprised of (let s say) three sub-algorithms, all with different O() characteristics, e.g.: algorithm A: O(n) algorithm B: O(log(n)) algorithm C: O(n log(n)) How do ...

Grokking Timsort

There s a (relatively) new sort on the block called Timsort. It s been used as Python s list.sort, and is now going to be the new Array.sort in Java 7. There s some documentation and a tiny Wikipedia ...

Manually implementing high performance algorithms in .NET

As a learning experience I recently tried implementing Quicksort with 3 way partitioning in C#. Apart from needing to add an extra range check on the left/right variables before the recursive call, ...

Print possible strings created from a Number

Given a 10 digit Telephone Number, we have to print all possible strings created from that. The mapping of the numbers is the one as exactly on a phone s keypad. i.e. for 1,0-> No Letter for 2->...

Enumerating All Minimal Directed Cycles Of A Directed Graph

I have a directed graph and my problem is to enumerate all the minimal (cycles that cannot be constructed as the union of other cycles) directed cycles of this graph. This is different from what the ...

Quick padding of a string in Delphi

I was trying to speed up a certain routine in an application, and my profiler, AQTime, identified one method in particular as a bottleneck. The method has been with us for years, and is part of a "...

热门标签