apache-spark


Query A Nested Array in Parquet Records


I am trying different ways to query a record within a array of records and display complete Row as output.
I dont know which nested Object has String "pg". But i want to query on particular object. Whether the object has "pg" or not. If "pg" exist then i want to display that complete row. How to write "spark sql query" on nested objects without specfying the object index.So i dont want to use the index of children.name
My Avro Record:
{
"name": "Parent",
"type":"record",
"fields":[
{"name": "firstname", "type": "string"},
{
"name":"children",
"type":{
"type": "array",
"items":{
"name":"child",
"type":"record",
"fields":[
{"name":"name", "type":"string"}
]
}
}
}
]
}
I am using Spark SQL context to query dataframe which is read.
So if input is
Row no Firstname Children.name
1 John Max
Pg
2 Bru huna
aman
Output should return poq 1 since it has row where one object of children.name is pg.
val results = sqlc.sql("SELECT firstname, children.name FROM nestedread where children.name = 'pg'")
results.foreach(x=> println(x(0), x(1).toString))
The above query doesn't work. but it works when i query children[1].name.
I ALSO want to know that if i can filter a set of records and then explode. Instead of first explode and create large number of rows and then filter.
It seems that you can use
org.apache.spark.sql.functions.explode(e: Column): Column
for example in my project(in java), i have nested json like this:
{
"error": [],
"trajet": [
{
"something": "value"
}
],
"infos": [
{
"something": "value"
}
],
"timeseries": [
{
"something_0": "value_0",
"something_1": "value_1",
...
"something_n": "value_n"
}
]
}
and i wanted to analyse datas in "timeseries", so i did:
DataFrame ts = jsonDF.select(org.apache.spark.sql.functions.explode(jsonDF.col("timeseries")).as("t"))
.select("t.something_0",
"t.something_1",
...
"t.something_n");
I'm new to spark too. Hope this could give you a hint.
The problem was solved by
I found a way through Explode.
val results = sqlc.sql("SELECT firstname, child.name, FROM parent LATERAL VIEW explode(children) childTable AS child

Related Links

How to distribute dataframe creation with spark and impala jdbc driver
error: value trainModel is not a member of hex.tree.gbm.GBM
Using NLineInputFormat with Python and Spark [duplicate]
Spark - Issue with PairFlatMapFunction function
Parallel tree processing - would Spark fit in?
How to find first non-null values in groups? (secondary sorting using dataset api)
Spark SQL: bad performance of “Insert into/overwrite spark unmanaged bucket table”
Online learning of LDA model in Spark
groupByKey in Spark dataset
Apache Spark Peromance S3 vs EC2 HDFS
Partitioning incompletely specified error in my spark application
Acessing nested columns in pyspark dataframe
While creating sequenceFile getting ERROR nativeio.NativeIO: Unable to initialize NativeIO libraries
How do we optimize data transfer between cpu and gpu in Apache Spark? [duplicate]
Gradual Increase in old generation heap memory
Do DISK_ONLY blocks still disappear in Spark 2 if an executor dies?

Categories

HOME
asp.net-mvc
web-services
sharepoint-2013
anylogic
localforage
encog
ibm-midrange
dbus
hyperion
openmodelica
ellipse
lag
jgit
jmespath
amplitude
thrift
jscodeshift
smartcard
jmeter-plugins
pwm
abstract-syntax-tree
stylesheet
pvs-studio
x-sendfile
flexjs
ruamel.yaml
udid
dashboard-designer
codeskulptor
outlook-vba
titanium-alloy
confirm
apartment-gem
firedac
dxgi
qmenubar
oracle-aq
psycopg2
ctl
recurrence
paypal-webhooks
sshd
haste
liclipse
netbeans-7
omniauth
qtcpserver
django-errors
gwt-material-design
nsd
jsbin
printscreen
post-increment
namenode
rabbitvcs
js-of-ocaml
illegalstateexception
linqtocsv
document-ready
ssha
coldfusion-10
monkeyrunner
wt
esky
vbo
dhc
gem
smarty2
vendor
modern-ui
computer-science-theory
boxapiv2
imaging
master-theorem
sonata-media-bundle
lightstreamer
str-to-date
hamlc
dependency-walker
dynamic-usercontrols
12factor
google-provisioning-api
xsocket
applaud
asp.net-session
rgba
force.com
mknetworkkit
android-screen
ssao
staging
queryanalyzer
gendarme
eclipse-tptp
server.transfer
xlink
corporate-policy
reference-library

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App