apache-spark


How to specify partition numbers when write a dataframe to parquet using PySpark


I want to write a spark dataframe to parquet but rather than specify it as partitionBybut the numPartitions or the size of each partition. Is there an easy way to do that in PySpark?
If all you care is the number of partitions the method is exactly the same as for any other output format - you can repartition DataFrame with given number of partitions and use DataFrameWriter afterwards:
df.repartition(n).write.parquet(some_path)

Related Links

Online learning of LDA model in Spark
groupByKey in Spark dataset
Apache Spark Peromance S3 vs EC2 HDFS
Partitioning incompletely specified error in my spark application
Acessing nested columns in pyspark dataframe
While creating sequenceFile getting ERROR nativeio.NativeIO: Unable to initialize NativeIO libraries
How do we optimize data transfer between cpu and gpu in Apache Spark? [duplicate]
Gradual Increase in old generation heap memory
Do DISK_ONLY blocks still disappear in Spark 2 if an executor dies?
Tree reduction aggregation in Spark Graphx?
Is there any limit on the value returned by `count()` in Apache Spark
How to process DynamoDB Stream in a Spark streaming application
How Spark “remember” transformations to pipeline in one Stage
How does Spark evict cached partitions?
Does MLLib only accept the libsvm data format?
Why does Spark not invoke reduceByKey when the Tuple2's key is the original object in mapToPair

Categories

HOME
regex
codeigniter
simulated-annealing
x-editable
apache-pig
haskell-stack
isis
localforage
android-sms
android-dialer
google-adwords
player-swift
zipfile
drag
push
sqlconnection
executable
scrollbar
zend-expressive
facebook-opengraph
monaco-editor
ssrs-2008
slowcheetah
nsmutableattributedstring
swig
advantage-database-server
azure-ad-graph-api
broker
ocean
file-manager
pmd
toastr
nest-thermostat
spring-data-cassandra
symfony-process
ulimit
permgen
appdynamics
autofocus
asyncsocket
uservoice
replaygain
ice
jawr
billing
android-camera-intent
manifoldcf
sscanf
shift-reduce-conflict
angular2-rc5
bootstrap-slider
sqlfiddle
formatter
rras
ios-frameworks
okuma
ctakes
parsoid
thinktecture-ident-server
delphi-10-seattle
hjson
rdotnet
llvm-gcc
swift-array
words
modern-ui
imake
esx
brooklyn
prettyfaces
lwuit-list
subdirectory
notifydatasetchanged
email-spam
grunt-contrib-copy
preon
dfsort
repeating
visual-assist
asp.net-authentication
sender
guvnor
arr
obout
letter-spacing
intel-8080
data-oriented-design
blackberry-os-v5
mkmapviewdelegate
queryanalyzer
net-ssh
utility
resharper-4.5
usenet
plumtree

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App