I think this issue is best taken up on the cassandra-user mailing list; that is where people are.
Cassandra does not have automatic load balancing yet but it may do so in the not-too-distant future. The 0.5 branch may be capable of this now.
Essentially when you bootstrap a node on an already-running system, it should find a spot in the ring which will load balance best and put itself there. Provided you add nodes one at a time (i.e. wait for one node to finish bootstrapping before adding another), that should work pretty well, provided your key distribution doesn t change too much over time.
However, your keys may change over time (especially if they are time-based) so you might want a workaround.
It depends on what you want to range-scan. If you only need to range scan PART of the key, you could hash the bit that you don t want to range scan, and use that as the first part of the key.
I ll use the term "partition" here to refer to the part of the key you don t want to range scan
function makeWholeKey(partition, key) {
return concat(make_hash(partition), partition, key);
}
Now if you want to range scan the keys within a given partition, you can range scan between makeWholeKey(p,start) and makeWholeKey(p,end)
But if you want to scan the partitions, you re out of luck.
But you can make your nodes have tokens which are evenly distributed around the range of make_hash() output, and you ll get evenly distributed data (assuming you have ENOUGH partitions that it doesn t all clump up on one or two hash values)