apache-spark


Query A Nested Array in Parquet Records


I am trying different ways to query a record within a array of records and display complete Row as output.
I dont know which nested Object has String "pg". But i want to query on particular object. Whether the object has "pg" or not. If "pg" exist then i want to display that complete row. How to write "spark sql query" on nested objects without specfying the object index.So i dont want to use the index of children.name
My Avro Record:
{
"name": "Parent",
"type":"record",
"fields":[
{"name": "firstname", "type": "string"},
{
"name":"children",
"type":{
"type": "array",
"items":{
"name":"child",
"type":"record",
"fields":[
{"name":"name", "type":"string"}
]
}
}
}
]
}
I am using Spark SQL context to query dataframe which is read.
So if input is
Row no Firstname Children.name
1 John Max
Pg
2 Bru huna
aman
Output should return poq 1 since it has row where one object of children.name is pg.
val results = sqlc.sql("SELECT firstname, children.name FROM nestedread where children.name = 'pg'")
results.foreach(x=> println(x(0), x(1).toString))
The above query doesn't work. but it works when i query children[1].name.
I ALSO want to know that if i can filter a set of records and then explode. Instead of first explode and create large number of rows and then filter.
It seems that you can use
org.apache.spark.sql.functions.explode(e: Column): Column
for example in my project(in java), i have nested json like this:
{
"error": [],
"trajet": [
{
"something": "value"
}
],
"infos": [
{
"something": "value"
}
],
"timeseries": [
{
"something_0": "value_0",
"something_1": "value_1",
...
"something_n": "value_n"
}
]
}
and i wanted to analyse datas in "timeseries", so i did:
DataFrame ts = jsonDF.select(org.apache.spark.sql.functions.explode(jsonDF.col("timeseries")).as("t"))
.select("t.something_0",
"t.something_1",
...
"t.something_n");
I'm new to spark too. Hope this could give you a hint.
The problem was solved by
I found a way through Explode.
val results = sqlc.sql("SELECT firstname, child.name, FROM parent LATERAL VIEW explode(children) childTable AS child

Related Links

What is happening when Spark is calling ShuffleBlockFetcherIterator?
how to introduce the schema in a Row in Spark?
Apache Spark WARN MemoryStore: Not enough space
Why does Spark 1.5.2 throw “local class incompatible” in standalone mode?
Spark mllib shuffling the data
Replaying an RDD in spark streaming to update an accumulator
rdd action will be suspended in DStream foreachRDD function
Spark - write Avro file
Connecting Bluemix virtual sensors to an instance of Spark service
How to upgrade Spark to newer version?
1.5.1| Spark Streaming | NullPointerException with SQL createDataFrame
DataFrame partitionBy to a single Parquet file (per partition)
Apache Spark Multi Node Clustering - java.io.FileNotFoundException
How to deal with concatenated Avro files?
Why does Spark task take a long time to find block locally?
Apache Spark:executor driver lost

Categories

HOME
ffmpeg
jsp
identityserver4
lua
azure-web-sites
android-sms
antlr4
payment-processing
jbpm
restsharp
predictionio
player-swift
ceph
mobilefirst-adapters
google-maps-sdk-ios
telnet
jsqmessagesviewcontroller
partitioning
image-gallery
nsmutableattributedstring
curve-fitting
gitlab-ci-runner
contains
jscodeshift
advantage-database-server
revitpythonshell
atg
py4j
cadvisor
burp
shape
icloud-drive
effects
azure-api-apps
google-crawlers
nav
apartment-gem
runc
windows-scripting
word-embedding
easy-digital-downloads
angular2-seed
angularjs-routing
sweet.js
freecodecamp
gulp-babel
adal.js
sql-mode
simple.odata.client
harvest-scm
swift-package-manager
android-camera-intent
android-alertdialog
hard-drive-failure
logstash-jdbc
printscreen
struts1
android-location
xmllint
menustrip
submission
selenium2library
intrinsics
lidar
jni4net
tag-helpers
hypervisor
gulp-rename
base-conversion
spatial-index
gem
linegraph
alienvault
google-swiffy
access-log
smarty2
antiforgerytoken
wso2as
twitter-follow
slick2d
misra
physx
repaint
symfony-2.5
citymaps
first-class-functions
notifydatasetchanged
nitrogen
ituneslibrary
jxta
data-dictionary
showcaseview
rte
neventstore
arbtt
speaker
hamlc
symfony-2.2
cassette
pixelsense
datakey
google-profiles-api
tipsy
mdi
settings.bundle
jetspeed2
eye-detection
jdic
net-ssh
coords
server.transfer
versant-vod
metaphone

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App