apache-spark


How Can collect_set find the source? [duplicate]


According to the docs, the collect_set and collect_list functions should be available in Spark SQL. However, I cannot get it to work. I'm running Spark 1.6.0 using a Docker image.
I'm trying to do this in Scala:
import org.apache.spark.sql.functions._
df.groupBy("column1")
.agg(collect_set("column2"))
.show()
And receive the following error at runtime:
Exception in thread "main" org.apache.spark.sql.AnalysisException: undefined function collect_set;
Also tried it using pyspark, but it also fails. The docs state these functions are aliases of Hive UDAFs, but I can't figure out to enable these functions.
How to fix this? Thanx!
Spark 2.0+:
SPARK-10605 introduced native collect_list and collect_set implementation. SparkSession with Hive support or HiveContext are no longer required.
Spark 2.0-SNAPSHOT (before 2016-05-03):
You have to enable Hive support for a given SparkSession:
In Scala:
val spark = SparkSession.builder
.master("local")
.appName("testing")
.enableHiveSupport() // <- enable Hive support.
.getOrCreate()
In Python:
spark = (SparkSession.builder
.enableHiveSupport()
.getOrCreate())
Spark < 2.0:
To be able to use Hive UDFs you have use Spark built with Hive support (this is already covered when you use pre-built binaries what seems to be the case here) and initialize SparkContext using HiveContext.
In Scala:
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.SQLContext
val sqlContext: SQLContext = new HiveContext(sc)
In Python:
from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)

Related Links

How can I define my ENV variables once in the DockerFile and pass them down to my spark image which is submitted by a supervisord managed script?
Unbalanced keys lead to performance problems in Spark
How to remove null data from JavaPairRDD
Spark Streaming: How Spark and Kafka communication happens?
Error while invoking spark-shell on windows
Best way to iterate/stream a Spark Dataframe
Is it is required to be data in hive matastore to be used in sql-context from spark?
How to modify a Spark Dataframe with a complex nested structure?
Object not serializable error on org.apache.avro.generic.GenericData$Record
How to run Spark Sql on a 10 Node cluster
How to do group by range query
Visualising a Matrix
More than one hour to execute pyspark.sql.DataFrame.take(4)
How to map a JavaDstream object into a string? Spark Streaming and Model Prediction JAVA
spark-submit: workers do not get assigned to the master
Fuzzy text matching in Spark

Categories

HOME
typescript
actions-on-google
unit-testing
demandware
xaml
apacheds
passwords
stack
turing
logback
textbox
go-gorm
delphi-xe7
apple-mail
entity
predictionio
zipfile
overloading
boxfuse
marketplace
frequency
openmodelica
sonata-admin
reload
numericupdown
message
lag
data-annotations
jquery-selectors
teraterm
ssrs-2008
gas
textmatebundles
haml
django-autocomplete-light
nice-language
symbolic-math
vnc
editorconfig
flatbuffers
apscheduler
matterjs
evercookie
azure-api-apps
complex-networks
jformattedtextfield
disqus
apartment-gem
uistoryboardsegue
mindstorms
h5py
mamp-pro
rating
xargs
android-studio-2.1
phasset
appdynamics
pseudo-element
ms-jet-ace
fido
simple.odata.client
glfw
abbyy
harvest-scm
ietf-netconf
catalina
autogen
email-notifications
ultratree
mit-scheme
building
webpagetest
telecommunication
cover
administrator
document-ready
interpreted-language
openejb
mtp
iphone-5
gem5
generalization
javaw
vendor
modern-ui
applicationstate
nolio
computer-science-theory
citymaps
concurrentdictionary
lwuit-list
before-save
ash
django-supervisor
magic-unipaas
angularjs-google-maps
windows-phone-7.1
disabled-input
location-client
sysadmin
firebird2.1
datadesign
word-2010
symfony-2.2
locomotivejs
zmodem
ria
junitperf
boost-gil
mate
textboxlist
replay
mkreversegeocoder
custom-protocol
gendarme
server.transfer
sharepoint-feature

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App