apache-spark


Apache Spark : Reading file in Standalone cluster mode


I am currently using a graph that i load from a file when i run my Graphx application locally.
I'd like to run the application in cluster standalone mode.
Do I have to make changes like place the file in each cluster node?
Can I leave my application unchanged and just keep the file in the driver?
Thank you.
In order to allow the executors on the node to access an input file, the file should be access by the nodes.
The preferred way is to read the file from a location which support multi nodes, e.g. HDFS, cassandra
It is possible that placing a copy of the file on each node might work as well, but it isn't the recommended way.

Related Links

Converting RDD into a dataframe int vs Double
SparkSQL spanning of Cassandra logical rows
Persisting data to DynamoDB using Apache Spark
When trying to register a UDF using Python on I get an error about Spark BUILD with HIVE
Spark + Cassandra on EMR LinkageError
Spark: Use Temporary Table Twice in Query?
How can I define my ENV variables once in the DockerFile and pass them down to my spark image which is submitted by a supervisord managed script?
Unbalanced keys lead to performance problems in Spark
How to remove null data from JavaPairRDD
Spark Streaming: How Spark and Kafka communication happens?
Error while invoking spark-shell on windows
Best way to iterate/stream a Spark Dataframe
Is it is required to be data in hive matastore to be used in sql-context from spark?
How to modify a Spark Dataframe with a complex nested structure?
Object not serializable error on org.apache.avro.generic.GenericData$Record
How to run Spark Sql on a 10 Node cluster

Categories

HOME
regex
performance-testing
runnable
servlets
jsessionid
apacheds
linear-programming
chronicle
antlr
vsix
delphi-xe7
phpmailer
python-xarray
identifier
librsvg
highlight
ellipse
jsqmessagesviewcontroller
aar
durandal-2.0
snort
gitlab-ci-runner
procedure
teamcity-9.0
jruby
media-source
nintex-workflow
netcdf
py4j
core-location
precision
php-carbon
azure-redis-cache
scalability
debugview
stl
bluebird
instance
mobile-center
pvlib
ollydbg
spring-websocket
blockui
deep-copy
subscription
btle
virtualmin
xbrl
ganglia
glumpy
sizing
sscanf
logstash-jdbc
bootstrap-slider
mink
vb4android
file-import
heritrix
texture2d
balsamiq
activity-diagram
namenode
selenium2library
gcloud-java
jca
highslide
trello.net
win32com
filenet-content-engine
asp.net-authorization
parsoid
clipping
openejb
session-replication
voting
translucency
structured-programming
sslv3
mks
chatjs
brooklyn
notifydatasetchanged
nhunspell
wmv
deepzoom
hebrew
validform
angularjs-google-maps
windows-phone-7.1
couchrest
libnids
progressdialog
hamlc
square-cube
memento
wpf-4.0
rc-shell
mixing
mysql-backup
android-actionmode
arden-syntax
rgba
server.mappath
directoryentry
renderaction
pastebin
nsindexset
high-traffic
cassandra-0.7
project-settings
sqlitemanager
code-statistics
docking

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App