简言之,如果您的记录单像这样看待,那么,在绘制地图上的最佳方式是计算独一无二的访客。
DATE siteID action username
05-05-2010 siteA pageview jim
05-05-2010 siteB pageview tom
05-05-2010 siteA pageview jim
05-05-2010 siteB pageview bob
05-05-2010 siteA pageview mike
你们希望为每个网站找到独一无二的访客?
I was thinking the mapper would emit siteID username and the reducer would keep a set() of the unique usersnames per key and then emit the length of that set. However that would be potentially storing millions of usernames in memory which doesn t seem right. Anyone have a better way?
I m 采用按行方式流出
感谢