hadoop


merge tuple in Pig


I have two sets of tuples and I want to inner join them by first element and merge other parts into one tuple, wondering how to implement this in Pig on Hadoop?
Input two tuple sets,
1,(1,2)
2,(2,3)
1,(b,c,b,c)
2,(c,d,c,d)
Expected output,
1,(1,2,b,c,b,c)
2,(2,3,c,d,c,d)
thanks in advance,
Lin
A thought worth contemplating ...
Inputs :
dataA :
1 (1,2)
2 (2,3)
dataB:
1 (b,c,b,c)
2 (c,d,c,d)
Pig Script :
A = LOAD 'dataA' USING PigStorage('\t') AS (aid:long, atuple : tuple(af1:long, af2:long));
B = LOAD 'dataB' USING PigStorage('\t') AS (bid:long, btuple : tuple(bf1:chararray, bf2:chararray, bf3:chararray, bf4:chararray));
C = JOIN A BY aid, B BY bid;
D = FOREACH C GENERATE aid AS id, FLATTEN(atuple) AS (af1:long, af2:long) , FLATTEN(btuple) AS (bf1:chararray, bf2:chararray, bf3:chararray, bf4:chararray);
E = FOREACH D GENERATE id, (af1..bf4);
DUMP E;
Output : DUMP E :
(1,(1,2,b,c,b,c))
(2,(2,3,c,d,c,d))

Related Links

Database tables not showing in HUE
Hadoop Administration - Two node cluster
Hadoop: How to keep track of Clients and client sessions?
Atlas not able to fetch the hive table names while querying from UI.
Unable to start hiveserver2
hive tables store as parquet fail
Hadoop Yarn Job remains in ACCEPTED state after adding node labels
Spark clustering with yarn
After restarting the services, the impala tables are not coming up
What is Best Practice for Dev, QA, Production , DR for a Hadoop Cluster
update partitioned hive table if we already have some data in it
How to configure HDFS CLI on a windows machine?
Aggregate function for array union [HIVE]
How to connect snappy-data to hdfs
How to debug Py4JJavaError local class incompatible?
Why Spark DataFrame Repartition not working correctly

Categories

HOME
amazon-s3
identityserver4
tags
openshift-origin
android-sqlite
compiler-errors
graylog2
haskell-stack
blogs
outlook
geolocation
asana
passwords
wxpython
cql3
textbox
socrata
prestashop-1.7
identifier
scrollbar
openui5
lighttpd
installer
embed
w2ui
pywin32
lookup-tables
stackexchange-api
ssrs-2008
slowcheetah
zap
nsmutableattributedstring
h.264
stm32f4discovery
sfsafariviewcontroller
beanstalkd
jet
polynomials
slf4j
google-oauth2
italic
stylesheet
grouping
listbox
google-geocoder
expression-trees
ose
server-sent-events
nib
data-science
decompiler
ecmascript-4
jetty-9
solid-principles
runc
internet-explorer-10
pac
google-slides
automata-theory
gradle-tooling-api
tabview
dependency-management
strip-tags
koala-framework
billing
tryton
itertools
zenity
topojson
nsd
vlc-android
securesocial
vb6-migration
bash-completion
groupbox
highslide
xelement
ane
readline
pyenchant
cocoon-gem
perfview
viennacl
structured-programming
react-rails
solr-boost
theorem-proving
javaw
ratingbar
prettyfaces
keymapping
css3pie
antisamy
blueprint
xidel
nmea
onejar
qdebug
arbtt
datadesign
double-precision
wpf-4.0
browserid
rc-shell
time-limiting
azure-role-environment
sdl.net
syncml
arraycopy
remote-control
polyglot
time-estimation
greensoftware

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App