Spark streaming (v2.1.0) refers to Kafka (v.0.10.0) brokers with their hostname (not their IP addr)
I use the following Kafka settings: val kafkaParams = Map[String, Object]( "bootstrap.servers" -> "10.30.3.41:9092,10.30.3.42:9092,10.30.3.43:9092", "key.deserializer" -> classOf[StringDeserializer], "value.deserializer" -> classOf[StringDeserializer], "group.id" -> "123", "auto.offset.reset" -> "latest", "enable.auto.commit" -> (false: java.lang.Boolean) ) All the Kafka brokers are defined with their corresponding IP addresses (as shown above). However, when I start the streaming context, I get the following error: 16/12/31 01:46:06 DEBUG NetworkClient: Error connecting to node 1 at broker1:9092: java.io.IOException: Can't resolve address: broker1:9092 at org.apache.kafka.common.network.Selector.connect(Selector.java:171) at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:498) at org.apache.kafka.clients.NetworkClient.access$400(NetworkClient.java:48) ... Caused by: java.nio.channels.UnresolvedAddressException at sun.nio.ch.Net.checkAddress(Net.java:101) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:622) at org.apache.kafka.common.network.Selector.connect(Selector.java:168) ... 31 more broker1 is the hostname of my broker. Since I am not setting up DNS in the cluster, this name is not resolvable from all nodes. I can solve this issue by properly adding all the broker hostnames on /etc/hosts across all nodes. Unfortunately, I really don't want to manage /etc/hosts, and I really want to understand why Spark doesn't just connects to the brokers via their IP addresses as I explicitly list them under bootstrap.servers.
I believe it is more an issue with your Kafka configuration than Spark. Probably listeners and advertised.listeners are not set or are configured to use host name. If thats indeed the case these values are advertised to the consumers and result in the observed behavior. Configuring brokers to use IP addresses for these properties should resolve the problem: # Adjust security protocol according to your requirements # and replace public_host_ip with desired IP listeners=PLAINTEXT://public_host_ip:9092 # or 0.0.0.0:9092 advertised.listeners=PLAINTEXT://public_host_ip:9092
Error while invoking spark-shell on windows
Best way to iterate/stream a Spark Dataframe
Is it is required to be data in hive matastore to be used in sql-context from spark?
How to modify a Spark Dataframe with a complex nested structure?
Object not serializable error on org.apache.avro.generic.GenericData$Record
How to run Spark Sql on a 10 Node cluster
How to do group by range query
Visualising a Matrix
More than one hour to execute pyspark.sql.DataFrame.take(4)
How to map a JavaDstream object into a string? Spark Streaming and Model Prediction JAVA
spark-submit: workers do not get assigned to the master
Fuzzy text matching in Spark
Spark: Match columns from two dataframes
Spark Jobs crashing with ExitCodeException exitCode=15
Spark-Cassandra: how to efficiently restrict partitions
Spark job on hbase data