I ve got a cassandra cluster with a small number of rows (< 100). Each row has about 2 million columns. I need to get a full row (all 2 million columns), but things start failing all over the place before I can finish my read. I d like to do some kind of buffered read.
Ideally I d like to do something like this using Pycassa (no this isn t the proper way to call get
, it s just so you can get the idea):
results = {}
start = 0
while True:
# Fetch blocks of size 500
buffer = column_family.get(key, column_offset=start, column_count=500)
if len(buffer) == 0:
break
# Merge these results into the main one
results.update(buffer)
# Update the offset
start += len(buffer)
Pycassa (and by extension Cassandra) don t let you do this. Instead you need to specify a column name for column_start
and column_finish
. This is a problem since I don t actually know what the start or end column names will be. The special value ""
can indicate the start or end of the row, but that doesn t work for any of the values in the middle.
So how can I accomplish a buffered read of all the columns in a single row? Thanks.