English 中文(简体)
How to enable outside connection before submit Pyspark job to Dataproc
原标题:

I have a Pyspark file which will be submitted to Dataproc.

try:
    print("Start writing")
    url = "jdbc:postgresql://some-ip:5432/postgres"
    properties = {
        "driver": "org.postgresql.Driver",
        "user": "postgres",
        "password": "root"
    }
    df.write.jdbc(url=url, table="result", mode="overwrite", properties=properties)

except Exception as e:
    print(e)
    sc.stop()

I use postgresql-42.6.0.jar JDBC driver and my database is postgresql 14.

Here is the error.

An error occurred while calling o86.jdbc.
: org.postgresql.util.PSQLException: The connection attempt failed.
        at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:331)
        at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49)
        at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:247)
        at org.postgresql.Driver.makeConnection(Driver.java:434)
        at org.postgresql.Driver.connect(Driver.java:291)
        at org.apache.spark.sql.execution.datasources.jdbc.connection.BasicConnectionProvider.getConnection(BasicConnectionProvider.scala:49)
...
Caused by: java.net.SocketTimeoutException: connect timed out
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
...

Here is how I submit my job through google cloud shell

gcloud beta dataproc jobs submit pyspark gs://taro-de-intern/pyspark_postgresql.py
  --cluster my-cluster 
  --jars gs://my-bucket/postgresql-42.6.0.jar

I suspect that it has something to do with driver so I downgrade my jar file version to 42.4.2. But it didn t work and yield the same error.

I even tried to change the format to

df.write 
.format("jdbc") 
.option("driver", "org.postgresql.Driver") 
.option("url", "jdbc:postgresql://some-ip:5432/postgres") 
.option("dbtable", "schema.result") 
.option("user", "postgres") 
.option("password", "root") 
.save()

also yield the same error

问题回答

I already sort it out so here is the solution. If you are using any cloud database(SQL instance on GCP, AWS, Azure)

Don t forget to allow outside connection

Here is where you can enable outside connection on GCP cloud SQL instance.

  1. Go to edit in your SQL instance overview

  2. Go to connections

  3. Add network(You won t have the allow all network when you first open) by entering the name of your connection(name doesn t matter) and your IP address.
    For more information please visit Microsoft website on subnet mask.
    Note: This is for the sake of example so don t allow all connection (0.0.0.0) in the real production.
    enter image description here

  4. Scroll down to the bottom and click save.





相关问题
摘录数据

我如何将Excel板的数据输入我的Django应用? I m将PosgreSQL数据库作为数据库。

Postgres dump of only parts of tables for a dev snapshot

On production our database is a few hundred gigabytes in size. For development and testing, we need to create snapshots of this database that are functionally equivalent, but which are only 10 or 20 ...

How to join attributes in sql select statement?

I want to join few attributes in select statement as one for example select id, (name + + surname + + age) as info from users this doesn t work, how to do it? I m using postgreSQL.

What text encoding to use?

I need to setup my PostgreSQL DB s text encoding to handle non-American English characters that you d find showing up in languages such as German, Spanish, and French. What character encoding should ...

SQL LIKE condition to check for integer?

I am using a set of SQL LIKE conditions to go through the alphabet and list all items beginning with the appropriate letter, e.g. to get all books where the title starts with the letter "A": SELECT * ...

热门标签