Question

I am currently working on a recommender application and I am using cassandra with hadoop and pig for map/reduce jobs. To take advantage of the column names properties our team has decided to store data using valueless columns and aggregate column names so for example all hits for a specific content are stored in a column family with a single row, and each column is a hit for the content using the following structure:

rowkey =  single_row  {
    id_content:hit_date, -
    .
    .
    .
}

利用这个方法,我们得到了宽长的行而不是瘦的行; 问题是,我需要如何操纵猪体内的数据才能用这个方法将数据储存在卡桑德拉里?

Answer 1

我不确定您在评论中是否使用复合列, 或者您是否只是将 id_ content 和 hit_date 合并在一起。

对于正常(即非复合)列,其图案是:

(key, {(col_name, col_value), ...})

就综合栏目而言,我认为计划如下:

(key, {((col_name_part_1, col_name_part_2), col_value), ...})

这一评估(综合栏目)的基础是阅读在https://issues.apache.org/jira/browse/CASSANDRA-3684 上提交的补丁。

友情链接