apache-spark


How to specify partition numbers when write a dataframe to parquet using PySpark


I want to write a spark dataframe to parquet but rather than specify it as partitionBybut the numPartitions or the size of each partition. Is there an easy way to do that in PySpark?
If all you care is the number of partitions the method is exactly the same as for any other output format - you can repartition DataFrame with given number of partitions and use DataFrameWriter afterwards:
df.repartition(n).write.parquet(some_path)

Related Links

How to upgrade Spark to newer version?
1.5.1| Spark Streaming | NullPointerException with SQL createDataFrame
DataFrame partitionBy to a single Parquet file (per partition)
Apache Spark Multi Node Clustering - java.io.FileNotFoundException
How to deal with concatenated Avro files?
Why does Spark task take a long time to find block locally?
Apache Spark:executor driver lost
spark saveAsTextFile method is really strange in java api,it just not work right in my program
Spark/S3 Importing Data
Feature normalization algorithm in Spark
How to create a connection to a remote Spark server and read in data from ipython running on local machine?
Apache Drill - Memory / Stream as Data Source
Spark + Mesos cluster mode, who uploads the jar?
Spark on a single node: speed improvement
Oozie > Spark action > why jar element does not accept multiple jars
Spark map is only one task while it should be parallel (PySpark)

Categories

HOME
matlab
odoo-10
web-services
apache-pig
tcl
apacheds
xamarin.forms-listview
libreoffice
xquery
akka.net
jena
struts2-jquery
gcov
sparkpost
group-by
message
lag
atmelstudio
durandal-2.0
google-finance
realm-java
mlr
google-distancematrix-api
c-strings
smallbasic
stylesheet
physics
regex-group
effects
flatbuffers
temp
monads
pmd
charts.js
gecko
panel-data
svgpanzoom
background-process
seh
rights-management
sinon
consistency
android-studio-2.1
appdynamics
adsutil.vbs
openwebanalytics
sections
notimplementedexception
canopen
waf
android-sdk-tools
plotrix
django-errors
gwt-material-design
hard-drive-failure
surroundscm
android-chips
olingo
post-increment
ckcontainer
agents-jade
struts1
singly-linked-list
oracle-policy-automation
vb6-migration
scrutinizer
flexigrid
jni4net
sqlj
wand
database-tuning-advisor
pyenchant
wt
ppl
wso2as
tortoisecvs
gnat
django-supervisor
cellular-automata
xacml3
data-quality
tlb
google-provisioning-api
vote
teamcity-7.0
expression-encoder-sdk
magento-1.5
noaa
adbwireless
executescalar
xtratreelist
wmd
gumstix
contracts
gendarme
run-length-encoding
shareware
metaphone

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App