Question

I am looking for an abstract data structure which represents a collection of sets such that no set in the collection is a subset of another set in the collection.

This means that on insert the following conditions will be met:

A. Inserting an element that is already a subset of another element will return the original collection.

B. Inserting an element that is a superset of any other elements will result in a collection with the superset added and the subsets removed.

Assuming an ordering on the elements of the set, then a prefix tree can be used to represent the collection. This permits condition A to be handled very quickly (ie it takes no longer to check the condition than it would to insert the subset) however meeting condition B takes time.

I am wondering if there is data structure that allows B to be met quickly as well.

Answer 1

The trivial approach would be to keep a list of sets and perform a linear search through that for every incoming set (testing if the incoming is a subset).

This obviously runs in O(n) time for the linear search and possibly O(m) size for the size of the incoming set. Thus O(n*m) total time (number of sets vs. size of each set).

The most obvious optimization, of course, is to index on set sizes. Then you only test each incoming set against those which are of equal or larger size. (A set cannot be a subset of any smaller set, duh!).

The next optimization that comes to mind is to create in index of elements. Thus for each incoming set you d find the intersection of each sets containing each of the elements. In other words if, for incoming set {a,b,c}, we find that element {a} exists in sets A, B, and D, element {b} exists in B, E, and F, and {c} exists in A, B, and Z ... then the incoming set is a subset of B (the intersection of {A, B, D}, {B, E, F}, and {A, B, Z}).

So, that sounds like O(m*log(n)) complexity to me. (We have to perform hashed searches on each element of each incoming set). Insertions should also be on the same order (inserting the new set s ID into each of the element s maps). (In Big-O analysis 2*O(mlog(n)) reduces down to O(mlog(n)), of course).

Answer 2

A trivial idea, which will work in O(K) where K is size of element being added.

keep sets in whatever way u want
keep map of set_id -> set_size
keep map of object -> set_id

both A and B are O(K).

Answer 3

If the individual members of your sets A, B, ... are mapped to distinct (and relatively) prime numbers, and alongside each set you store the product of all the members as p(A), p(B) etc. then subsets and supersets can be found by whether p(X) is a factor of p(Y) or vice versa.

You could end up with some very large numbers I guess, but it works in theory.

For example:

if [a b c d] -> [2 3 5 7], p(abc) = 30, p(abd) = 42, p(bc) = 15, p(abcd) = 210

Answer 4

What a dorky site! I have now registered, so may in due course be able to comment on stuff from yesterday. Until then, however...

@Stephen C: although I believe my English to be adequate I seem to have acquired an explicator: he has missed bits out, however, and his comment should read as follows:

@Stephen C: finding the factors of an arbitrary number is indeed NP complete, but not relevant here. The issue is whether the smaller of two numbers exactly divides the larger, a simple modulus operation. For example, p(bc)=15 is a divisor of p(abcd)=210, so bc is a subset of abcd (as are sets abd and abc).

Adding a new set S to the existing collection of N sets is O(N), assuming that comparison and division of the large numbers takes roughly the same time regardless of N.

For each existing entry E in the collection of sets, stop if p(S) < p(E) and p(S) divides p(E) exactly. If p(S) = p(E), stop. Remove E if p(S) > p(E) and p(E) divides p(S) exactly. Add S if you get to the end of the collection. An array of BigNums would work.

@JL: if you d like to be my on-site stalker please endeavour to 1) add value 2) accurately.

友情链接