apache-spark


Spark 2.0 state: COMPLETE Exit status code -100 on yarn


Can someone point me to documentation on what -100 exit code means? EMR cluster, spark 2.0.0 on YARN (per EMR standard spark-cluster deployment). I've seen https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cdh_sg_yarn_container_exec_errors.html which gives some error codes, of which -100 is not one of them. Also, as a more general question, it seems that neither the YARN container logs and the Spark container logs contain much information on what causes such a failure ... from the YARN logs I see
17/01/18 17:51:58 INFO YarnAllocator: Canceling requests for 1 executor container(s) to have a new desired total 4164 executors.
17/01/18 17:51:58 INFO YarnAllocator: Driver requested a total number of 4163 executor(s).
17/01/18 17:51:58 INFO YarnAllocator: Canceling requests for 1 executor container(s) to have a new desired total 4163 executors.
17/01/18 17:51:58 INFO YarnAllocator: Driver requested a total number of 4162 executor(s).
17/01/18 17:51:58 INFO YarnAllocator: Canceling requests for 1 executor container(s) to have a new desired total 4162 executors.
17/01/18 17:51:59 INFO YarnAllocator: Driver requested a total number of 4161 executor(s).
17/01/18 17:51:59 INFO YarnAllocator: Driver requested a total number of 4160 executor(s).
17/01/18 17:51:59 INFO YarnAllocator: Canceling requests for 2 executor container(s) to have a new desired total 4160 executors.
17/01/18 17:52:00 INFO YarnAllocator: Driver requested a total number of 4159 executor(s).
17/01/18 17:52:00 INFO YarnAllocator: Canceling requests for 1 executor container(s) to have a new desired total 4159 executors.
17/01/18 17:52:00 INFO YarnAllocator: Completed container container_1483555419510_0037_01_000114 on host: ip-172-20-221-152.us-west-2.compute.internal (state: COMPLETE, exit status: -100)
17/01/18 17:52:00 WARN YarnAllocator: Container marked as failed: container_1483555419510_0037_01_000114 on host: ip-172-20-221-152.us-west-2.compute.internal. Exit status: -100. Diagnostics: Container released on a *lost* node
17/01/18 17:52:00 INFO YarnAllocator: Completed container container_1483555419510_0037_01_000107 on host: ip-172-20-221-152.us-west-2.compute.internal (state: COMPLETE, exit status: -100)
17/01/18 17:52:00 WARN YarnAllocator: Container marked as failed: container_1483555419510_0037_01_000107 on host: ip-172-20-221-152.us-west-2.compute.internal. Exit status: -100. Diagnostics: Container released on a *lost* node
17/01/18 17:52:00 INFO YarnAllocator: Will request 2 executor containers, each with 7 cores and 22528 MB memory including 2048 MB overhead
17/01/18 17:52:00 INFO YarnAllocator: Canceled 0 container requests (locality no longer needed)
17/01/18 17:52:00 INFO YarnAllocator: Submitted container request (host: Any, capability: <memory:22528, vCores:7>)
17/01/18 17:52:00 INFO YarnAllocator: Submitted container request (host: Any, capability: <memory:22528, vCores:7>)
17/01/18 17:52:01 INFO YarnAllocator: Driver requested a total number of 4158 executor(s).
17/01/18 17:52:01 INFO YarnAllocator: Canceling requests for 1 executor container(s) to have a new desired total 4158 executors.
17/01/18 17:52:02 INFO YarnAllocator: Driver requested a total number of 4157 executor(s).
and Spark executor logs I see
17/01/18 17:39:39 INFO MemoryStore: MemoryStore cleared
17/01/18 17:39:39 INFO BlockManager: BlockManager stopped
17/01/18 17:39:39 INFO ShutdownHookManager: Shutdown hook called
neither of which is very informative?

Related Links

Object not serializable error on org.apache.avro.generic.GenericData$Record
How to run Spark Sql on a 10 Node cluster
How to do group by range query
Visualising a Matrix
More than one hour to execute pyspark.sql.DataFrame.take(4)
How to map a JavaDstream object into a string? Spark Streaming and Model Prediction JAVA
spark-submit: workers do not get assigned to the master
Fuzzy text matching in Spark
Spark: Match columns from two dataframes
Spark Jobs crashing with ExitCodeException exitCode=15
Spark-Cassandra: how to efficiently restrict partitions
Spark job on hbase data
SparkSQL restrict queries by Cassandra partition key ranges
Merging equi-partitioned data frames in Spark
Writing custom UDF in spark on a List to get Index
Getting the cluster hierarchy using BisectingKMeans clustering

Categories

HOME
azure
performance
mocking
ssl
jsp
ssl-certificate
tcl
haskell-stack
streaming
delphi-xe7
payment-processing
sony
sweetalert
build.gradle
regression
autofac
telnet
calayer
frequency
condor
lag
graphcool
slime
many-to-many
deadbolt
aspdotnetstorefront
mediator
symbolic-math
iteration
confidence-interval
effects
stl
panel-data
react-redux-form
cloudinary
tensor
lifecycle
spring-data-cassandra
executorservice
presto
json-rpc
perl-module
vue2
sourcetree
freerdp
xargs
simplesamlphp
domdocument
master-data-management
maven-versions-plugin
harvest-scm
ifc
surroundscm
mink
ocsp
codeigniter-upload
android-location
webdeploy-3.5
oracle-policy-automation
uploadify
mars
facebook-comments
okuma
vimeo-ios
storage-duration
egl
metalkit
dual-table
ssha
materialdrawer
post-processor
linq2db
ironmq
polygons
tabbar
spatial-index
declaration
vraptor
android-tablelayout
computer-science-theory
liferay-hook
sqlhelper
mui
asf
team-explorer-everywhere
nhunspell
jqgrid-php
css3pie
asciiencoding
lightstreamer
pydatalog
execcommand
siblings
emitmapper
qdebug
jquery-data
scrollpane
couchpotato
chromeless
uptime
neventstore
sender
ad-hoc-distribution
ed
httpconnection
semantic-zoom
datatemplate
nsindexset
program-transformation
arraycopy
utility
stsadm
database-dump
mud

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App