English 中文(简体)
• 如何将两袋装在猪 la里
原标题:how to combine/concat two bags in pig latin
  • 时间:2012-05-19 00:40:24
  •  标签:
  • apache-pig

我有两个数据集:

A = {uid, url}; B = {uid, url};

现为<编码>cogroup:

C = COGROUP A BY uid, B BY uid;

页: 1 Group AS uid, DISTINCT A.url+B.url};

我的问题是,我如何把两袋A.url和B.url混为一谈?

或者说它不同,我如何在多个栏目上读取?

问题回答

这不是你所期望的,而是我从你的问题中理解的:

C = JOIN A BY uid, B BY uid;
D = DISTINCT C;

分类方式如下:

E = FOREACH D GENERATE CONCAT(A::uid,B::uid); 
A = LOAD  A  using PigStorage() as (uid,url);
B = LOAD  B  using PigStorage() as (uid,url);
C = JOIN A by uid ,B by uid;
D = FOREACH C GENERATE $0,CONCAT(A::url,B::url);
E= DISTINCT D;
dump E;




相关问题
Merging multiple files into one within Hadoop

I get multiple small files into my input directory which I want to merge into a single file without using the local file system or writing mapreds. Is there a way I could do it using hadoof fs ...

Bundling jars, when submittingmap/reduce work through Pig?

I m试图将Hadoop、Pig和Casandra合并起来,以便能够通过简单的Pig查询,就Casses储存的数据开展工作。 问题在于,我不得不做一些工作来创造实际工作的地图/绘画。

generating bigram combinations from grouped data in pig

given my input data in userid,itemid format: raw: {userid: bytearray,itemid: bytearray} dump raw; (A,1) (A,2) (A,4) (A,5) (B,2) (B,3) (B,5) (C,1) (C,5) grpd = GROUP raw BY userid; dump grpd; (A,{(...

Difference between Pig and Hive? Why have both? [closed]

My background - 4 weeks old in the Hadoop world. Dabbled a bit in Hive, Pig and Hadoop using Cloudera s Hadoop VM. Have read Google s paper on Map-Reduce and GFS (PDF link). I understand that- Pig s ...

Regexp matching in pig

Using apache pig and the text hahahah. my brother just didnt do anything wrong. He cheated on a test? no way! I m trying to match "my brother just didnt do anything wrong." Ideally, I d want to ...

How to use Cassandra s Map Reduce with or w/o Pig?

Can someone explain how MapReduce works with Cassandra .6? I ve read through the word count example, but I don t quite follow what s happening on the Cassandra end vs. the "client" end. https://svn....

Storing data to SequenceFile from Apache Pig

Apache Pig can load data from Hadoop sequence files using the PiggyBank SequenceFileLoader: REGISTER /home/hadoop/pig/contrib/piggybank/java/piggybank.jar; DEFINE SequenceFileLoader org.apache.pig....

热门标签