I ve got a cassandra cluster with a fairly small number of rows (2 million or so, which I would hope is "small" for cassandra). Each row is keyed on a unique UUID, and each row has about 200 columns (give or take a few). All in all these are pretty small rows, no binary data or large amounts of text. Just short strings.
I ve just finished the initial import into the cassandra cluster from our old database. I ve tuned the hell out of cassandra on each machine. There were hundreds of millions of writes, but no reads. Now that it s time to USE this thing, I m finding that read speeds are absolutely dismal. I m doing a multiget using pycassa on anywhere from 500 to 10000 rows at a time. Even at 500 rows, the performance is awful sometimes taking 30+ seconds.
What would cause this type of behavior? What sort of things would you recommend after a large import like this? Thanks.