Path for persistent, managed hive tables in Spark 1.4
I am new to Spark and working on JavaSqlNetworkWordCount example to append the word count in a persistent table. I understand that I can only do it via HiveContext. HiveContext, however, keeps trying to save the table in /user/hive/warehouse/. I have tried changing the path by adding hiveContext.setConf("hive.metastore.warehouse.dir", "/home/user_name"); and by adding the property <property><name>hive.metastore.warehouse.dir</name> <value>/home/user_name</value></property> $SPARK_HOME/conf/hive-site.xml, but nothing seems to work. If anyone else has faced this problem, please let me know if/how you resolved it. I am using Spark1.4 on my local RHEL5 machine.
I think I solved the problem. It looks like spark-submit was creating a metastore_db directory in root directory of the jar file. If metastore_db exists, then hive-stie.xml values are ignored. As soon as I removed that directory, code picked up values from hive-site.xml. I still cannot set the value of the hive.metastore.warehouse.dir property from the code, though.
Getting destination directories/ locations of spark jobs executed by all users
How to create RDD inside map function
Use of implicit parameter in spark
Why do we need two different conf files in spark?
Apache Spark Graphx: Run shortest path on all vertices
How to send large DataFrame column as list of values to a Function efficiently
how to kill spark job of a spark application?
What is and how to control Memory Storage in Executors tab in web UI?
Why does sortBy transformation trigger a Spark job?
accessing a Row, after a df conversion, in Pyspark
How to check if array column is inside another column array in pyspark dataframe
Spark dataframe parralelism after filtering
Spark UDFs that work on Any
PySpark exclude files from list
Scala: Defining Primary Key in Data Frame
How to Multiply Spark MLLIB sparse matrix/vector with breeze matrix/vector?