apache-spark


PySpark exclude files from list


When I use sc.textFile('*.txt') I take everything.
I'd like to be able to filter out several files.
e.g. How can I take all file except ['bar.txt', 'foo.txt'] ?
This is more of a workaround:
get file list:
import os
file_list = os.popen('hadoop fs -ls <your dir>').readlines()
Filter it:
file_list = [x for x in file_list if (x not in ['bar.txt','foo.txt')
and x[-3:]=='txt']
Read it:
rdd = sc.textFile(['<your dir>/'+x for x in file list])

Related Links

How to do group by range query
Visualising a Matrix
More than one hour to execute pyspark.sql.DataFrame.take(4)
How to map a JavaDstream object into a string? Spark Streaming and Model Prediction JAVA
spark-submit: workers do not get assigned to the master
Fuzzy text matching in Spark
Spark: Match columns from two dataframes
Spark Jobs crashing with ExitCodeException exitCode=15
Spark-Cassandra: how to efficiently restrict partitions
Spark job on hbase data
SparkSQL restrict queries by Cassandra partition key ranges
Merging equi-partitioned data frames in Spark
Writing custom UDF in spark on a List to get Index
Getting the cluster hierarchy using BisectingKMeans clustering
Intellij connect hortonwork spark remotely failed
Spark - How can get the Logical / Physical Query execution using - Thirft - Hive Interactor

Categories

HOME
validation
mongodb
common-lisp
unit-testing
google-search
pyqt5
x-editable
spring-roo
outlook
jqgrid
gluon
jbpm
akka.net
android-vision
frequency
editor
yii2-advanced-app
googletest
next
jmespath
org-mode
django-autocomplete-light
rgdal
google-distancematrix-api
csom
bitcode
epicorerp
dsl
point-cloud-library
text-parsing
android-arrayadapter
reformatting
bluebird
breakpoint-sass
jackrabbit
bits
angular-xeditable
partition
knowledge-management
bem
karnaugh-map
autofocus
smartbanner
domdocument
html-lists
ice
cmsmadesimple
itertools
lcov
atmosphere.js
tachyon
building
singly-linked-list
winobjc
schematiq
yapdatabase
lidar
screen-capture
administrator
actioncable
sejda
gulp-rename
julian
spinach
avalonedit
keypad
xerces
gem5
solr-boost
cordova-cli
declaration
android-tablelayout
iscroll4
laravel-validation
express-jwt
aps
serverside-javascript
hebrew
xsltforms
ramdisk
fay
dfsort
showcaseview
android-lru-cache
couchrest
emitmapper
broadcom
firebird2.1
hla
mms-gateway
xhtml-transitional
asp.net-authentication
slimbox
locomotivejs
wescheme
file-structure
vote
documentviewer
vim-fugitive
media-manager
noindex
intel-8080
websolr
exact-synergy-enterprise
ramaze
textboxlist
eye-detection
automapping
dynamic-websites
contracts
web-based
sqlprofileprovider

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App