If your sets are stored in s, for example:
s = [set([1, 2]), set([1, 3]), set([1, 2, 3]), set([2, 4])]
Then you can use itertools.combinations to take them two by two, and calculate the intersection (note that, as Alex pointed out, combinations
is only available since version 2.6). Here with a list comrehension (just for the sake of the example):
from itertools import combinations
[ i[0] & i[1] for i in combinations(s,2) ]
Or, in a loop, which is probably what you need:
for i in combinations(s, 2):
inter = i[0] & i[1]
# processes the intersection set result "inter"
So, to have the length of each one of them, that "processing" would be:
l = len(inter)
This would be quite efficient, since it s using iterators to compute every combinations, and does not prepare all of them in advance.
Edit: Note that with this method, each set in the list "s" can actually be something else that returns a set, like a generator. The list itself could simply be a generator if you are short on memory. It could be much slower though, depending on how you generate these elements, but you wouldn t need to have the whole list of sets in memory at the same time (not that it should be a problem in your case).
For example, if each set is made from a function gen
:
def gen(parameter):
while more_sets():
# ... some code to generate the next set x
yield x
with open("results", "wt") as f_results:
for i in combinations(gen("data"), 2):
inter = i[0] & i[1]
f_results.write("%d
" % len(inter))
Edit 2: How to collect indices (following redrat s comment).
Besides the quick solution I answered in comment, a more efficient way to collect the set indices would be to have a list of (index, set)
instead of a list of set
.
Example with new format:
s = [(0, set([1, 2])), (1, set([1, 3])), (2, set([1, 2, 3]))]
If you are building this list to calculate the combinations anyway, it should be simple to adapt to your new requirements. The main loop becomes:
with open("results", "wt") as f_results:
for i in combinations(s, 2):
inter = i[0][1] & i[1][1]
f_results.write("length of %d & %d: %d
" % (i[0][0],i[1][0],len(inter))
In the loop, i[0]
and i[1]
would be a tuple (index, set)
, so i[0][1]
is the first set, i[0][0]
its index.