apache-spark


Netezza Drivers not available in Spark (Python Notebook) in DataScienceExperience


I have a project code in Python Notebook and it ran all good when Spark was hosted in Bluemix.
We are running the following code to connect to Netezza (on premises) which worked fine in Bluemix.
VT = sqlContext.read.format('jdbc').options(url='jdbc:netezza://169.54.xxx.x:xxxx/BACC_PRD_ISCNZ_GAPNZ',user='XXXXXX', password='XXXXXXX', dbtable='GRACE.CDVT_LIVE_SPARK', driver='org.netezza.Driver').load()'
However, after migration to DatascienceExperience, we are getting the following error. I have established the secure gateway and its all working fine, but this code is not running. I think the issue is with the Netezza driver. If it is the case, is there a way we can explicitly import the class/driver so the above code can be executed. Please help how we can address the issue.
Error Message:
/usr/local/src/spark20master/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
61 def deco(*a, **kw):
62 try:
---> 63 return f(*a, **kw)
64 except py4j.protocol.Py4JJavaError as e:
65 s = e.java_exception.toString()
/usr/local/src/spark20master/spark/python/lib/py4j-0.10.3-src.zip /py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
317 raise Py4JJavaError(
318 "An error occurred while calling {0}{1} {2}.\n".
--> 319 format(target_id, ".", name), value)
320 else:
321 raise Py4JError(
Py4JJavaError: An error occurred while calling o212.load.
: java.lang.ClassNotFoundException: org.netezza.driver
at java.net.URLClassLoader.findClass(URLClassLoader.java:607)
at java.lang.ClassLoader.loadClassHelper(ClassLoader.java:844)
at java.lang.ClassLoader.loadClass(ClassLoader.java:823)
at java.lang.ClassLoader.loadClass(ClassLoader.java:803)
at org.apache.spark.sql.execution.datasources.jdbc.DriverRegistry$.register(DriverRegistry.scala:38)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createC onnectionFactory$1.apply(JdbcUtils.scala:49)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createC onnectionFactory$1.apply(JdbcUtils.scala:49)
at scala.Option.foreach(Option.scala:257)
You can install a jar file by adding a cell with an exclamation mark that runs a unix tool to download the file, in this example wget:
!wget https://some.public.host/yourfile.jar -P ${HOME}/data/libs
After downloading the file you will need to restart your kernel.
Note this approach assumes your jar file is publicly available on the Internet.
Notebooks in Bluemix and notebooks in DSX (Data Science Experience) currently use the same backend, so they have access to the same pre-installed drivers. Netezza isn't among them. As Chris Snow pointed out, users can install additional JARs and Python packages into their service instances.
You probably created a new service instance for DSX, and did not yet install the user JARs and packages that the old one had. It's a one-time setup, therefore easy to forget when you've been using the same instance for a while. Execute these commands in a Python notebook of the old instance on Bluemix to check for user-installed things:
!ls -lF ~/data/libs
!pip freeze
Then install the missing things into your new instance on DSX.
There is another way to connect to Netezza using ingest connector which
is by default enabled in DSX.
http://datascience.ibm.com/docs/content/analyze-data/python_load.html
from ingest import Connectors
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
NetezzaloadOptions = {
Connectors.Netezza.HOST : 'hostorip',
Connectors.Netezza.PORT : 'port',
Connectors.Netezza.DATABASE : 'databasename',
Connectors.Netezza.USERNAME : 'xxxxx',
Connectors.Netezza.PASSWORD : 'xxxx',
Connectors.Netezza.SOURCE_TABLE_NAME : 'tablename'}
NetezzaDF = sqlContext.read.format("com.ibm.spark.discover").options(**NetezzaloadOptions).load()
NetezzaDF.printSchema()
NetezzaDF.show()
Thanks,
Charles.

Related Links

Spark + Cassandra on EMR LinkageError
Spark: Use Temporary Table Twice in Query?
How can I define my ENV variables once in the DockerFile and pass them down to my spark image which is submitted by a supervisord managed script?
Unbalanced keys lead to performance problems in Spark
How to remove null data from JavaPairRDD
Spark Streaming: How Spark and Kafka communication happens?
Error while invoking spark-shell on windows
Best way to iterate/stream a Spark Dataframe
Is it is required to be data in hive matastore to be used in sql-context from spark?
How to modify a Spark Dataframe with a complex nested structure?
Object not serializable error on org.apache.avro.generic.GenericData$Record
How to run Spark Sql on a 10 Node cluster
How to do group by range query
Visualising a Matrix
More than one hour to execute pyspark.sql.DataFrame.take(4)
How to map a JavaDstream object into a string? Spark Streaming and Model Prediction JAVA

Categories

HOME
azure
typescript
html5-canvas
dynamic-programming
document
hadoop
f#
wix
configuration
order
afp
prebuild
runtime-error
xamarin.forms-listview
wso2is
mvvm-light
sony
wget
autofac
eps
superfish
pie-chart
google-weather-api
w3.css
chart.js2
axios
ssrs-tablix
pywin32
h.264
jmeter-plugins
windows-applications
montecarlo
core-location
zos
padding
azure-redis-cache
weather-api
grouping
esri
code-signing
point-cloud-library
x-sendfile
temp
lync
reed-solomon
seh
server-sent-events
powerbuilder-conversion
joe-editor
drupal-webform
jsonstore
procobol
jpype
btle
dxgi
php-5.5
using
tern
spark-submit
free-diameter
android-studio-2.1
kendo-chart
ammonite
javascript-security
manifoldcf
callkit
xpdf
struts1
rras
angularjs-filter
isapi-rewrite
mutation
yapdatabase
training-data
uac
ssha
aldryn
core-data-migration
telescope
gem5
solr-boost
swift-array
json-web-token
spiceworks
office-2010
nine-patch
populate
azimuth
jqgrid-php
validform
preon
floating-point-conversion
ccombobox
tempo
alternate-data-stream
12factor
wpf-4.0
ad-hoc-distribution
ribbon-control
force.com
obout
httpconnection
ruby-1.9.2
blackberry-jde
sharepoint-timer-job
noaa
spquery

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App