merge tuple in Pig
I have two sets of tuples and I want to inner join them by first element and merge other parts into one tuple, wondering how to implement this in Pig on Hadoop? Input two tuple sets, 1,(1,2) 2,(2,3) 1,(b,c,b,c) 2,(c,d,c,d) Expected output, 1,(1,2,b,c,b,c) 2,(2,3,c,d,c,d) thanks in advance, Lin
A thought worth contemplating ... Inputs : dataA : 1 (1,2) 2 (2,3) dataB: 1 (b,c,b,c) 2 (c,d,c,d) Pig Script : A = LOAD 'dataA' USING PigStorage('\t') AS (aid:long, atuple : tuple(af1:long, af2:long)); B = LOAD 'dataB' USING PigStorage('\t') AS (bid:long, btuple : tuple(bf1:chararray, bf2:chararray, bf3:chararray, bf4:chararray)); C = JOIN A BY aid, B BY bid; D = FOREACH C GENERATE aid AS id, FLATTEN(atuple) AS (af1:long, af2:long) , FLATTEN(btuple) AS (bf1:chararray, bf2:chararray, bf3:chararray, bf4:chararray); E = FOREACH D GENERATE id, (af1..bf4); DUMP E; Output : DUMP E : (1,(1,2,b,c,b,c)) (2,(2,3,c,d,c,d))
Database tables not showing in HUE
Hadoop Administration - Two node cluster
Hadoop: How to keep track of Clients and client sessions?
Atlas not able to fetch the hive table names while querying from UI.
Unable to start hiveserver2
hive tables store as parquet fail
Hadoop Yarn Job remains in ACCEPTED state after adding node labels
Spark clustering with yarn
After restarting the services, the impala tables are not coming up
What is Best Practice for Dev, QA, Production , DR for a Hadoop Cluster
update partitioned hive table if we already have some data in it
How to configure HDFS CLI on a windows machine?
Aggregate function for array union [HIVE]
How to connect snappy-data to hdfs
How to debug Py4JJavaError local class incompatible?
Why Spark DataFrame Repartition not working correctly