apache-spark


How to do multiple Kafka topics to multiple Spark jobs in parallel


Please forgive if this question doesn't make sense, as I am just starting out with Spark and trying to understand it.
From what I've read, Spark is a good use case for doing real time analytics on streaming data, which can then be pushed to a downstream sink such as hdfs/hive/hbase etc.
I have 2 questions about that. I am not clear if there is only 1 spark streaming job running or multiple at any given time. Say I have different analytics I need to perform for each topic from Kafka or each source that is streaming into Kafka, and then push the results of those downstream.
Does Spark allow you to run multiple streaming jobs in parallel so you can keep aggregate analytics separate for each stream, or in this case each Kafka topic. If so, how is that done, any documentation you could point me to ?
Just to be clear, my use case is to stream from different sources, and each source could have potentially different analytics I need to perform as well as different data structure. I want to be able to have multiple Kafka topics and partitions. I understand each Kafka partition maps to a Spark partition, and it can be parallelized.
I am not sure how you run multiple Spark streaming jobs in parallel though, to be able to read from multiple Kafka topics, and tabulate separate analytics on those topics/streams.
If not Spark is this something thats possible to do in Flink ?
Second, how does one get started with Spark, it seems there is a company and or distro to choose for each component, Confluent-Kafka, Databricks-Spark, Hadoop-HW/CDH/MAPR. Does one really need all of these, or what is the minimal and easiest way to get going with a big data pipleine while limiting the number of vendors ? It seems like such a huge task to even start on a POC.
You have asked multiple questions so I'll address each one separately.
Does Spark allow you to run multiple streaming jobs in parallel?
Yes
Is there any documentation on Spark Streaming with Kafka?
https://spark.apache.org/docs/latest/streaming-kafka-integration.html
How does one get started?
a. Book: https://www.amazon.com/Learning-Spark-Lightning-Fast-Data-Analysis/dp/1449358624/
b. Easy way to run/learn Spark: https://community.cloud.databricks.com

Related Links

Fuzzy text matching in Spark
Spark: Match columns from two dataframes
Spark Jobs crashing with ExitCodeException exitCode=15
Spark-Cassandra: how to efficiently restrict partitions
Spark job on hbase data
SparkSQL restrict queries by Cassandra partition key ranges
Merging equi-partitioned data frames in Spark
Writing custom UDF in spark on a List to get Index
Getting the cluster hierarchy using BisectingKMeans clustering
Intellij connect hortonwork spark remotely failed
Spark - How can get the Logical / Physical Query execution using - Thirft - Hive Interactor
Spark 1.6 Pearson correlation
How to read .csv file using spark-shell
NODE_LOCAL vs RACK_LOCAL task read time
No output after using the Spark Streaming
view machine learning model derived in Spark

Categories

HOME
sbt
demandware
react-navigation
amazon-kinesis
turing
tsql
wso2is
oracle-adf
jbpm
hadoop2
silverlight
research
pie-chart
oracle-ucm
development-environment
command-line-arguments
yii2-advanced-app
vscode-settings
asp.net-mvc-5.2
numericupdown
rebus
qualtrics
afnetworking-2
prerender
sfsafariviewcontroller
nice-language
italic
vnc
intersystems
typemock-isolator
meanjs
valueinjecter
complex-networks
fipy
bezier
linechart
visualsvn-server
msysgit
powerpc
appdynamics
gulp-babel
adal.js
scoring
g-wan
manual
glumpy
omniauth
iseries-navigator
sqlite-net
vistadb
logstash-jdbc
olingo
sql-server-2016-express
dms
bash-completion
intrinsics
viewengine
strace
cucumber-java
multer
samsung-gear
nbug
dhc
pegkit
bonfire
structured-programming
theorem-proving
opensc
bessel-functions
android-menu
short
mouseenter
xsltforms
jqgrid-php
blazeds
pydatalog
rte
datapump
sparc
nsmanagedobjectmodel
speaker
layout-manager
cassette
graphiti-js
ekeventkit
facebook-authentication
replay
seam-conversation
sloc
cracker
versant-vod

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App