English 中文(简体)
• 在PIG的主修课程中选取田地;
原标题:Referencing field in nested tuple in PIG;
  • 时间:2012-01-12 16:55:27
  •  标签:
  • apache-pig

I have been stuck on this for several hours and I cannot figure out what I am doing wrong. I have a relation "grouped" with the schema of

    grouped: {seedword: chararray,baggy: {outertup: (groupy: (seedword: chararray,coword: chararray))}}

A sample of what the relation looks like is: (auto,{((auto,car)),((auto,truck))})

我只需要提出新词和一句话。 举例说,我想要

(auto, (car, car)

我已尝试:

 FOREACH grouped GENERATE baggy::outertup.groupy.coword;

 FOREACH grouped GENERATE baggy.outertup.groupy.coword;
 FOREACH grouped GENERATE baggy.groupy.coword;

所有这些工作都没有,给我错误的信息,说不存在这样的领域。 请帮助!

HEre s some more of my code:

keywords = LOAD  merged  USING as ( seedword:chararray, doc:chararray);

---COUNT HOW MANY DOCUMENTS EACH WORD IS IN
group_by_seedword = GROUP keywords BY $0;

invert_index = FOREACH group_by_seedword GENERATE $0 as seedword:chararray, keywords.$1;
word_doc_count= FOREACH invert_index GENERATE seedword, COUNT($1);

-- map words to document
words_in_doc= GROUP keywords BY doc;
word_docs = FOREACH words_in_doc GENERATE group AS doc, keywords.seedword;
--(document:(keyword, keyword, keyword...))

--map words to their cowords in doc
temp_join = JOIN keywords BY doc,word_docs BY doc;
--DUMP temp_join;
cowords_by_doc = FOREACH temp_join GENERATE $0 as seedword:chararray, $3 as cowords;

cowords_interm=  FOREACH cowords_by_doc GENERATE seedword, FLATTEN(cowords);
cowords = FILTER cowords_interm BY (seedword!=$1);---GETS RID OF SINGLE DOC WORD; 
temp_join_count1 = JOIN cowords BY $0, word_doc_count BY seedword; 

-- GETS WORDS THAT OCCURE BY THEMSELVES IN A SINGLE DOCUMENT
G = JOIN cowords_interm BY $0 LEFT OUTER, cowords by $0;
orph_word = FILTER G BY $2 is null;
orph_word_count = FOREACH orph_word GENERATE $0,null, 0;

temp_join_count= UNION temp_join_count1, orph_word_count; 

inter_frac = FOREACH temp_join_count GENERATE $0 as seedword:chararray, $1 as coword:chararray, 1.0/$3 as frac:double;
inter_frac_combine = GROUP inter_frac BY (seedword, coword); 
inter_frac_sum = FOREACH inter_frac_combine GENERATE $0 , SUM(inter_frac.frac) as frac:double;

filtered = FILTER inter_frac_sum BY ($1 >=$relatedness_ratio);
grouped= GROUP filtered by $0.seedword;
g = FOREACH grouped GENERATE group as seedword:chararray, filtered.$0;   
named = FOREACH g GENERATE $0 as seedword:chararray, $1 as baggy:bag{(outertup:tuple(groupy:tuple(seedword:chararray, coword:chararray)))};

你们可以尝试的投入文件应当如此:

car doc1.txt
auto doc1.txt
bunny doc2.txt
ball doc2.txt
toy car doc2.txt
random doc3.txt

b. 计划c3.txt

问题回答

I d had a similar issue where I couldn t reference inner tuples. My solution was to flatten the data and then some more filtering and grouping. Cheers V





相关问题
Merging multiple files into one within Hadoop

I get multiple small files into my input directory which I want to merge into a single file without using the local file system or writing mapreds. Is there a way I could do it using hadoof fs ...

Bundling jars, when submittingmap/reduce work through Pig?

I m试图将Hadoop、Pig和Casandra合并起来,以便能够通过简单的Pig查询,就Casses储存的数据开展工作。 问题在于,我不得不做一些工作来创造实际工作的地图/绘画。

generating bigram combinations from grouped data in pig

given my input data in userid,itemid format: raw: {userid: bytearray,itemid: bytearray} dump raw; (A,1) (A,2) (A,4) (A,5) (B,2) (B,3) (B,5) (C,1) (C,5) grpd = GROUP raw BY userid; dump grpd; (A,{(...

Difference between Pig and Hive? Why have both? [closed]

My background - 4 weeks old in the Hadoop world. Dabbled a bit in Hive, Pig and Hadoop using Cloudera s Hadoop VM. Have read Google s paper on Map-Reduce and GFS (PDF link). I understand that- Pig s ...

Regexp matching in pig

Using apache pig and the text hahahah. my brother just didnt do anything wrong. He cheated on a test? no way! I m trying to match "my brother just didnt do anything wrong." Ideally, I d want to ...

How to use Cassandra s Map Reduce with or w/o Pig?

Can someone explain how MapReduce works with Cassandra .6? I ve read through the word count example, but I don t quite follow what s happening on the Cassandra end vs. the "client" end. https://svn....

Storing data to SequenceFile from Apache Pig

Apache Pig can load data from Hadoop sequence files using the PiggyBank SequenceFileLoader: REGISTER /home/hadoop/pig/contrib/piggybank/java/piggybank.jar; DEFINE SequenceFileLoader org.apache.pig....

热门标签