English 中文(简体)
利用HBase进行分析
原标题:Using HBase for analytics

I m 几乎全新到HBase。 我谨以我的SQL为基础,对目前的现场进行跟踪,并将其带至HBase,因为我的SQL只是稍有改动。

I m

我需要跟踪用户的不同行动,并需要能够在某些方面加以汇总(日期、来自哪一个国家、他们与行动的产品等等)。

我现在储存的是,我有一张带有所有这些方面(国家、日期、产品......)的综合PK表格,其余领域是行动的反面。 在采取行动时,我插入表格,将行动一栏加起来(ON DUPLICATE KEY UPDATE...)。

*date      | *country | *product | visited | liked | put_to_basket | purchased
2011-11-11 | US       | 123      | 2       | 1     | 0             | 0
2011-11-11 | GB       | 123      | 23      | 10    | 5             | 4
2011-11-12 | GB       | 555      | 54      | 0     | 10            | 2

I have a feeling that this is completely against the HBase way, and also doesn t really scale (with the growing number if keys inserts get expensive) and not really flexible.

• 如何有效地跟踪用户行动及其特性在HBase? 表格应如何看待? 《地图》在什么地方出现?

Thanks for all suggestions!

最佳回答

Lars George s “HBASE: the final Guide”对设计的解释与你想要达到的

问题回答

如下:

在Hbase有以下独特的浏览量:

rowid = 日期+国家+产品->将其归入单一实体并具有关键地位。

然后将反射作为栏目。 因此,当你们收到这样的事件时,

if(event == liked){
increment the liked column of the hbase by 1 for the corresponding key combination.
}

其他案件。

希望!





相关问题
Hadoop - namenode is not starting up

I am trying to run hadoop as a root user, i executed namenode format command hadoop namenode -format when the Hadoop file system is running. After this, when i try to start the name node server, it ...

What default reducers are available in Elastic MapReduce?

I hope I m asking this in the right way. I m learning my way around Elastic MapReduce and I ve seen numerous references to the "Aggregate" reducer that can be used with "Streaming" job flows. In ...

Establishing Eclipse project environment for HadoopDB

I have checked-out a project from SourceForge named HadoopDB. It uses some class in another project named Hive. I have used Eclipse Java build path setting to link source to the Hive project root ...

Hadoop: intervals and JOIN

I m very new to Hadoop and I m currently trying to join two sources of data where the key is an interval (say [date-begin/date-end]). For example: input1: 20091001-20091002 A 20091011-20091104 ...

hadoop- determine if a file is being written to

Is there a way to determine if a file in hadoop is being written to? eg- I have a process that puts logs into hdfs. I have another process that monitors for the existence of new logs in hdfs, but I ...

Building Apache Hive - impossible to resolve dependencies

I am trying out the Apache Hive as per http://wiki.apache.org/hadoop/Hive/GettingStarted and am getting this error from Ivy: Downloaded file size doesn t match expected Content Length for http://...

热门标签