merge tuple in Pig
I have two sets of tuples and I want to inner join them by first element and merge other parts into one tuple, wondering how to implement this in Pig on Hadoop? Input two tuple sets, 1,(1,2) 2,(2,3) 1,(b,c,b,c) 2,(c,d,c,d) Expected output, 1,(1,2,b,c,b,c) 2,(2,3,c,d,c,d) thanks in advance, Lin
A thought worth contemplating ... Inputs : dataA : 1 (1,2) 2 (2,3) dataB: 1 (b,c,b,c) 2 (c,d,c,d) Pig Script : A = LOAD 'dataA' USING PigStorage('\t') AS (aid:long, atuple : tuple(af1:long, af2:long)); B = LOAD 'dataB' USING PigStorage('\t') AS (bid:long, btuple : tuple(bf1:chararray, bf2:chararray, bf3:chararray, bf4:chararray)); C = JOIN A BY aid, B BY bid; D = FOREACH C GENERATE aid AS id, FLATTEN(atuple) AS (af1:long, af2:long) , FLATTEN(btuple) AS (bf1:chararray, bf2:chararray, bf3:chararray, bf4:chararray); E = FOREACH D GENERATE id, (af1..bf4); DUMP E; Output : DUMP E : (1,(1,2,b,c,b,c)) (2,(2,3,c,d,c,d))
Block assignation using network topology
flume loss data when collect online data to hdfs
SemanticException [Error 10007]: Ambiguous column reference _c1
“Can not validate” error using JSON SerDe with Hive in HDInsight
Streaming data [Hadoop/MapReduce] - What are the challenges?
jps lists datanodes, but not dfsadmin. Can't copy to hdfs
How to rename output file(s) of Hive on EMR?
Logging from mappers into one location
Prevent camus from increasing the offset value
Modify cloudera manager port 7180 to 80
Does addition of properties to conf object available back in driver ?
Copying files from SFTP to S3 using Apache Pig
Hive unable to perform queries other than SELECT *
Is it possible to retrieve schema from avro data and use them in MapReduce?
Hadoop: Image processing of colored images
Is it possible for Hadoop MapReduce programs to access local resource?