English 中文(简体)
Collection of sets containing no sets which are a subset of another in the collection
原标题:

I am looking for an abstract data structure which represents a collection of sets such that no set in the collection is a subset of another set in the collection.

This means that on insert the following conditions will be met:

A. Inserting an element that is already a subset of another element will return the original collection.

B. Inserting an element that is a superset of any other elements will result in a collection with the superset added and the subsets removed.

Assuming an ordering on the elements of the set, then a prefix tree can be used to represent the collection. This permits condition A to be handled very quickly (ie it takes no longer to check the condition than it would to insert the subset) however meeting condition B takes time.

I am wondering if there is data structure that allows B to be met quickly as well.

最佳回答

The trivial approach would be to keep a list of sets and perform a linear search through that for every incoming set (testing if the incoming is a subset).

This obviously runs in O(n) time for the linear search and possibly O(m) size for the size of the incoming set. Thus O(n*m) total time (number of sets vs. size of each set).

The most obvious optimization, of course, is to index on set sizes. Then you only test each incoming set against those which are of equal or larger size. (A set cannot be a subset of any smaller set, duh!).

The next optimization that comes to mind is to create in index of elements. Thus for each incoming set you d find the intersection of each sets containing each of the elements. In other words if, for incoming set {a,b,c}, we find that element {a} exists in sets A, B, and D, element {b} exists in B, E, and F, and {c} exists in A, B, and Z ... then the incoming set is a subset of B (the intersection of {A, B, D}, {B, E, F}, and {A, B, Z}).

So, that sounds like O(m*log(n)) complexity to me. (We have to perform hashed searches on each element of each incoming set). Insertions should also be on the same order (inserting the new set s ID into each of the element s maps). (In Big-O analysis 2*O(mlog(n)) reduces down to O(mlog(n)), of course).

问题回答

A trivial idea, which will work in O(K) where K is size of element being added.

  • keep sets in whatever way u want
  • keep map of set_id -> set_size
  • keep map of object -> set_id

both A and B are O(K).

If the individual members of your sets A, B, ... are mapped to distinct (and relatively) prime numbers, and alongside each set you store the product of all the members as p(A), p(B) etc. then subsets and supersets can be found by whether p(X) is a factor of p(Y) or vice versa.

You could end up with some very large numbers I guess, but it works in theory.

For example:

if [a b c d] -> [2 3 5 7], p(abc) = 30, p(abd) = 42, p(bc) = 15, p(abcd) = 210

What a dorky site! I have now registered, so may in due course be able to comment on stuff from yesterday. Until then, however...

@Stephen C: although I believe my English to be adequate I seem to have acquired an explicator: he has missed bits out, however, and his comment should read as follows:


@Stephen C: finding the factors of an arbitrary number is indeed NP complete, but not relevant here. The issue is whether the smaller of two numbers exactly divides the larger, a simple modulus operation. For example, p(bc)=15 is a divisor of p(abcd)=210, so bc is a subset of abcd (as are sets abd and abc).

Adding a new set S to the existing collection of N sets is O(N), assuming that comparison and division of the large numbers takes roughly the same time regardless of N.

For each existing entry E in the collection of sets, stop if p(S) < p(E) and p(S) divides p(E) exactly. If p(S) = p(E), stop. Remove E if p(S) > p(E) and p(E) divides p(S) exactly. Add S if you get to the end of the collection. An array of BigNums would work.


@JL: if you d like to be my on-site stalker please endeavour to 1) add value 2) accurately.





相关问题
The Fastest DataStructure to Filter with in C#

Currently we are filtering and sorting data with a datatable. /// <summary> /// Filters the data table and returns a new data table with only the filtered rows. /// </summary>...

Efficient queue in Haskell

How can I efficiently implement a list data structure where I can have 2 views to the head and end of the list, that always point to a head a tail of a list without expensive calls to reverse. i.e: ...

Java large datastructure for storing a matrix

I need to store a 2d matrix containing zip codes and the distance in km between each one of them. My client has an application that calculates the distances which are then stored in an Excel file. ...

Holding onto items after a postback

I have an ASP.NET web application and I want to be able to take items from a master list and store them temporarliy into one of four other lists. The other lists need to survive post backs so that ...

negative number in the stack

I am a new student in the compilers world ^_^ and I want to know is legal represent negative number in the stack. For example: infix: 1-5=-4 postfix: 15- The statements are: push(1) push(5) x=...

What type of struct/container would you use in this instance?

I am trying to figure out what type of structure or container I should use for a quick project. I need to have an unknown number of sets that will be entered from the GUI (each one will have a name, ...

热门标签