apache-spark


Spark: merging RDD


How to deal with the RDD structure: after calling "map" function I get a RDD where myObject is my own class consisting on a serialization of a xml.
I want to merge every myObject of the RDD into one.
I have implemented the foreach function and called a specific function inside it but the problem is that there is a lot of myObject so it spends many time.
*Edit: piece of code implemeting 'reduce':
JavaRDD<MyObject> objects = null;
objects = input.map(new Function<String, MyObject>() { public MyObject call(String s) throws MalformedURLException, SAXException, ParserConfigurationException, IOException{
machine.initializeReader(delimiter);
return machine.Request(s);
}
});
MyObject finalResult = objects.reduce(new Function2<MyObject, MyObject, MyObject>() {
#Override
public MyObject call(MyObject myObject, MyObject myObject2) {
return myObject.merge(myObject2);
}
});
'Merge' function loops through 'MyObject' elements and merge common ones: if I have a tag with specific id and the same in the other "myObject' then I create a result containing the tag and adding 2.
The problem using 'reduce' or 'foreach' is the spent time.
Thanks

Related Links

How to modify a Spark Dataframe with a complex nested structure?
Object not serializable error on org.apache.avro.generic.GenericData$Record
How to run Spark Sql on a 10 Node cluster
How to do group by range query
Visualising a Matrix
More than one hour to execute pyspark.sql.DataFrame.take(4)
How to map a JavaDstream object into a string? Spark Streaming and Model Prediction JAVA
spark-submit: workers do not get assigned to the master
Fuzzy text matching in Spark
Spark: Match columns from two dataframes
Spark Jobs crashing with ExitCodeException exitCode=15
Spark-Cassandra: how to efficiently restrict partitions
Spark job on hbase data
SparkSQL restrict queries by Cassandra partition key ranges
Merging equi-partitioned data frames in Spark
Writing custom UDF in spark on a List to get Index

Categories

HOME
uwp
odoo-10
sqlite
redirect
logstash
ssl-certificate
gluu
xcode8
restsharp
autofac
stm
signature
sonata-admin
installer
monaco-editor
accumulo
watchservice
plc
orientdb-2.1
abaqus
rational
exe4j
netlify
deadbolt
liferay-aui
addthis
weather-api
listbox
trial
laravel-valet
shinydashboard
clappr
panel-data
sqlite.swift
doctrine-odm
aem-6
google-crawlers
json-rpc
message-hub
gradle-tooling-api
rtems
autofocus
strip-tags
rating-system
multistore
jquery-masonry
okular
masm32
paypal-webhooks
netbeans-7
omniauth
waf
logstash-jdbc
flattr
amazon-dynamodb-streams
turnjs
xendesktop
visual-build-professional
namenode
enhanced-ecommerce
groups
facebook-comments
schematiq
multiple-files
asp.net-authorization
uac
dbfit
sejda
sample-data
many-to-one
structured-programming
stringr
emberfire
diagonal
newsql
sigkill
mks
applicationstate
excel-charts
bit.ly
typesafe-activator
imaging
iqueryable
iphone-privateapi
data-quality
mydbr
serverside-javascript
xml.modify
nsimageview
jflow
route-me
msxsl
j#
c++builder-xe2
expressionvisitor
httpconnection
b-method
katta
ember-router
combinators
renderaction

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App