apache-spark


How to create an empty RowMatrix in Apache Spark


Is there any way to create an empty RowMatrix in Apache Spark. I have tried the following
double[] empty = new double[0];
Vector vector = Vectors.dense(empty);
But I can not create a JavaRDD of Vector from vector, so that I can create the RowMatrix out of it.
Thanks in Advance.
Looking from https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.linalg.distributed.RowMatrix, its not possible to create an empty RowMatrix
So the constructor requires passing a RDD<Vector> as one of the calling way
I tried a sample in spark-shell in Scala. Hope this helps
import org.apache.spark.mllib.linalg.Vector
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.linalg.distributed.RowMatrix
val data = sc.parallelize(Array(Array[Double](1,2,3,4),Array[Double](2,3,4,5),Array[Double](3,4,5,6))).map(x=> Vectors.dense(x))
val rowMatrix: RowMatrix = new RowMatrix(data)
Now we can perform required operation on the rowMatrix which is type RowMatrix while data is RDD<Vector>
Also Vectors.dense require a list of Arrays as Double so we might require to case the initial array as Double if its not already.

Related Links

how to kill spark job of a spark application?
What is and how to control Memory Storage in Executors tab in web UI?
Why does sortBy transformation trigger a Spark job?
accessing a Row, after a df conversion, in Pyspark
How to check if array column is inside another column array in pyspark dataframe
Spark dataframe parralelism after filtering
Spark UDFs that work on Any
PySpark exclude files from list
Scala: Defining Primary Key in Data Frame
How to Multiply Spark MLLIB sparse matrix/vector with breeze matrix/vector?
How to do multiple Kafka topics to multiple Spark jobs in parallel
Netezza Drivers not available in Spark (Python Notebook) in DataScienceExperience
How Can collect_set find the source? [duplicate]
Spark streaming (v2.1.0) refers to Kafka (v.0.10.0) brokers with their hostname (not their IP addr)
Spark: merging RDD
Delete HdfsTarget before running a SparkSubmitTask

Categories

HOME
amazon-s3
opencl
jsonschema
feed
wikipedia
svg.js
box
netbeans-8
cobol
aac
openui5
resolve
postgresql-9.5
cloudrail
message
lag
partitioning
aar
syswow64
data-annotations
procedure
media-source
jet
urlrewriter.net
symbolic-math
trial
svn2git
localdb
ose
mobilefirst-bluemix
hamcrest
nav
chamilo-lms
extraction
confirm
json-rpc
bezier
protobuf-3
pac
pseudo-element
ewsjavaapi
adal.js
clientscript
domdocument
okular
tpl-dataflow
attributerouting
jawr
javascript-security
cabal-install
hard-drive-failure
angular2-rc5
logstash-jdbc
bootstrap-slider
xpdf
ng-annotate
namenode
enhanced-ecommerce
oracle-policy-automation
pddl
filenet-content-engine
double-quotes
equinox
bonfire
mkpolyline
reactive-cocoa-3
opensc
notifydatasetchanged
aps
short
dotliquid
pydatalog
execcommand
location-client
repeating
visual-assist
google-website-optimizer
force.com
noindex
jscience
nsinvocation
katta
process.start
mate
staging
servletunit
custom-tag
noaa
drupal-gmap
queryanalyzer
msn-messenger
crc-cards
code-statistics

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App