English 中文(简体)
Computing degree of similarity among a group of sets
原标题:

Suppose there are 4 sets:

s1={1,2,3,4};
s2={2,3,4};
s3={2,3,4,5};
s4={1,3,4,5};

Is there any standard metric to present the similarity degree of this group of 4 sets?

Thank you for the suggestion of Jaccard method. However, it seems pairwise. How can I compute the similarity degree of the whole group of sets?

问题回答

Pairwise, you can compute the Jaccard distance of two sets. It s simply the distance between two sets, if they were vectors of booleans in a space where {1, 2, 3…} are all unit vectors.

Your question isn t very specific. But I suppose you mean something like the "edit distance" between them? I.e. how much you need to change s1 to get to s2?

Check out the Wikipedia article on Edit distance.

As Tobu said I d use the Jaccard Index which is just the intersection divided by the union of the sets.

you could compute the size of the intersection between each set

You could compute the Euclidean distance between them, and build a dendrogram from that to visualize similarity.





相关问题
PHP similar_text() in java

Do you know any strictly equivalent implementation of the PHP similar_text function in Java?

CPD / PMD between projects?

I am rephrasing this question to make it a little more straightforward and easy to understand, hopefully. I have roughly 30 components (internal) that go into a single web application. That means 30 ...

fast similarity detection

I have a large collection of objects and I need to figure out the similarities between them. To be exact: given two objects I can compute their dissimilarity as a number, a metric - higher values ...

Euclidian distance between posts based on tags

I am playing with the euclidian distance example from programming collective intelligence book, # Returns a distance-based similarity score for person1 and person2 def sim_distance(prefs,person1,...

get cosine similarity between two documents in lucene

i have built an index in Lucene. I want without specifying a query, just to get a score (cosine similarity or another distance?) between two documents in the index. For example i am getting from ...

Similarity Between Users Based On Votes

lets say i have a set of users, a set of songs, and a set of votes on each song: =========== =========== ======= User Song Vote =========== =========== ======= user1 song1 [...

Speeding up self-similarity in an image

I m writing a program that will generate images. One measurement that I want is the amount of "self-similarity" in the image. I wrote the following code that looks for the countBest-th best matches ...

热门标签