Delete HdfsTarget before running a SparkSubmitTask
Community, I'd like to delete the HdfsTarget folder before running a SparkSubmitTask. What is the best practice? So far I tried two options mentioned in the code attached without success: Dependent/required job doesn't get executed if HdfsTarget already exists Tasks would be executed in parallel if called with yield import luigi import luigi.format import luigi.contrib.hdfs from luigi.contrib.spark import SparkSubmitTask class CleanUp(luigi.Task): path = luigi.Parameter() def run(self): self.target = luigi.contrib.hdfs.HdfsTarget(self.path, format=luigi.format.Gzip) if self.target.exists(): self.target.remove(skip_trash=True) class MySparkTask(SparkSubmitTask): output = luigi.Parameter() driver_memory = '8g' executor_memory = '3g' num_executors = 5 app = 'my-app.jar' entry_class = 'com.company.MyJob' def app_options(self): return ['/input', self.output] def requires(self): (1) def output(self): return luigi.contrib.hdfs.HdfsTarget(self.output, format=luigi.format.Gzip) class RunAll(luigi.Task): result_dir = '/output' ''' Dummy task that triggers execution of a other tasks''' def requires(self): (2) return MySparkTask(self.result_dir)
Spark Streaming: How Spark and Kafka communication happens?
Error while invoking spark-shell on windows
Best way to iterate/stream a Spark Dataframe
Is it is required to be data in hive matastore to be used in sql-context from spark?
How to modify a Spark Dataframe with a complex nested structure?
Object not serializable error on org.apache.avro.generic.GenericData$Record
How to run Spark Sql on a 10 Node cluster
How to do group by range query
Visualising a Matrix
More than one hour to execute pyspark.sql.DataFrame.take(4)
How to map a JavaDstream object into a string? Spark Streaming and Model Prediction JAVA
spark-submit: workers do not get assigned to the master
Fuzzy text matching in Spark
Spark: Match columns from two dataframes
Spark Jobs crashing with ExitCodeException exitCode=15
Spark-Cassandra: how to efficiently restrict partitions