English 中文(简体)
Cassandra load balancing with an ordered partitioner?
原标题:
  • 时间:2009-11-20 01:35:43
  •  标签:
  • cassandra

So I see here that Cassandra does not have automatic load balancing, which comes into view when using the ordered partitioner (a certain common range of values of a group of rows would be stored on a relatively few machines which would then serve most of the queries).
What s The Best Practice In Designing A Cassandra Data Model?

I m still new to Cassandra and how it works. how would one go about avoiding this issue, so that range queries are still possible? I didn t really get the above answers (linked url) idea about appending a hash to keys.

问题回答

As mentioned on the other post, Cassandra 0.5 supports semiautomatic load balancing, where all you have to do is tell a node to loadbalance and it will move to a busier place on the token ring automatically.

This is covered in http://wiki.apache.org/cassandra/Operations

I think this issue is best taken up on the cassandra-user mailing list; that is where people are.

Cassandra does not have automatic load balancing yet but it may do so in the not-too-distant future. The 0.5 branch may be capable of this now.

Essentially when you bootstrap a node on an already-running system, it should find a spot in the ring which will load balance best and put itself there. Provided you add nodes one at a time (i.e. wait for one node to finish bootstrapping before adding another), that should work pretty well, provided your key distribution doesn t change too much over time.

However, your keys may change over time (especially if they are time-based) so you might want a workaround.

It depends on what you want to range-scan. If you only need to range scan PART of the key, you could hash the bit that you don t want to range scan, and use that as the first part of the key.

I ll use the term "partition" here to refer to the part of the key you don t want to range scan

function makeWholeKey(partition, key) {
   return concat(make_hash(partition), partition, key);
}

Now if you want to range scan the keys within a given partition, you can range scan between makeWholeKey(p,start) and makeWholeKey(p,end)

But if you want to scan the partitions, you re out of luck.

But you can make your nodes have tokens which are evenly distributed around the range of make_hash() output, and you ll get evenly distributed data (assuming you have ENOUGH partitions that it doesn t all clump up on one or two hash values)

Partitioning of data across cluster is controlled by the partitioner parameter in cassandra.yaml:

partitioner: org.apache.cassandra.dht.Murmur3Partitioner

Using Murmur3Partitioner will generate random hashcode for Row Key and perform load balancing.

With Cassandra 2.0, you can store multiple tokens (256) in single server, which will also help in load balancing. It is not good practice to use OrderPreservingPartitioner and is deprecated.





相关问题
How does Voldemort compare to Cassandra?

How does Voldemort compare to Cassandra? I m not talking about size of community and only want to hear from people who have actually used both. Especially I m interested in: How they dynamically ...

How does Cassandra rebalance when nodes go down?

Does anyone have experience with Cassandra when nodes go down or are unavailable? I am mostly interested in whether the cluster rebalances and what happens when the nodes come online, or are replaced ...

Cassandra time series data

We are looking at using Cassandra to store a stream of information coming from various sources. One issue we are facing is the best way to query between two dates. For example we will need to ...

Picking a database technology

We re setting out to build an online platform (API, Servers, Data, Wahoo!). For context, imagine that we need to build something like twitter, but with the comments (tweets) organized around a live ...

Row count of a column family in Cassandra

Is there a way to get a row count (key count) of a single column family in Cassandra? get_count can only be used to get the column count. For instance, if I have a column family containing users and ...

Update an existing column value

What happens when a new value for an existing column is added? Will the older value be overwritten by the new value? Or the older value will also retain and can be retrieved (similar to simpleDB)?

Cassandra Vs Amazon SimpleDB

I m working on an application where data size and SQL queries are going to be heavy. I am thinking between Cassandra or Amazon SimpleDB. Can you please suggest which is more suitable in this kind of ...

Cassandra load balancing with an ordered partitioner?

So I see here that Cassandra does not have automatic load balancing, which comes into view when using the ordered partitioner (a certain common range of values of a group of rows would be stored on a ...

热门标签