English 中文(简体)
如何与码头一道建立一个单独的小块园区,以从外部集装箱中执行平线工作?
原标题:How to set up a spark standalone cluster with docker to execute python jobs from external containers?

我试图利用以下码头组合混凝土。 yaml文档:

spark:
    image: bitnami/spark:3.3.2
    environment:
      - SPARK_MODE=master
    ports:
      -  8081:8080 
      -  7077:7077 
spark-worker:
  image: bitnami/spark:3.3.2
  environment:
    - SPARK_MODE=worker
    - SPARK_MASTER_URL=spark://spark:7077
    - SPARK_WORKER_MEMORY=4G
    - SPARK_EXECUTOR_MEMORY=4G
    - SPARK_WORKER_CORES=4
  ports:
    -  8082:8081 

想法是,继续把主机(主机和工人)集中起来,准备从其他机器的集装箱那里获得假冒的工作机会。

例如:

  • The machine A is hosting the cluster (spark+worker), created using docker-compose up
  • In machine B, I will create a spark container to run python job using spark-submit --master spark://:7077
  • In the machine C, I have a Airflow worker container also trying to run a python job using spark-submit
  • A jupyter notebook container creates a spark session to perform interactive computations.

我已成功地在一台物理机器上设立了我的“轮船”组。 总经理能够适当确定工人节点,在组别内(从主子和工人节)提供的工作进展顺利。

However, I encountered an issue when attempting to submit a job from a container running in other machine. While the job appears in the master s UI, it becomes stuck in an infinite loop with executors being requested, created, and destroyed repeatedly.

23/04/16 22:06:39 INFO BlockManagerMaster: Removal of executor 9 requested
23/04/16 22:06:39 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asked to remove non-existent executor 9
23/04/16 22:06:39 INFO BlockManagerMasterEndpoint: Trying to remove executor 9 from BlockManagerMaster.
23/04/16 22:06:39 INFO StandaloneSchedulerBackend: Granted executor ID app-20230416220604-0001/11 on hostPort 10.18.0.130:45783 with 1 core(s), 1024.0 MiB RAM
23/04/16 22:06:39 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20230416220604-0001/11 is now RUNNING
23/04/16 22:06:45 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20230416220604-0001/10 is now EXITED (Command exited with code 1)
23/04/16 22:06:45 INFO StandaloneSchedulerBackend: Executor app-20230416220604-0001/10 removed: Command exited with code 1
23/04/16 22:06:45 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20230416220604-0001/12 on worker-20230416220203-10.18.0.73-36107 (10.18.0.73:36107) with 1 core(s)
23/04/16 22:06:45 INFO StandaloneSchedulerBackend: Granted executor ID app-20230416220604-0001/12 on hostPort 10.18.0.73:36107 with 1 core(s), 1024.0 MiB RAM
23/04/16 22:06:45 INFO BlockManagerMaster: Removal of executor 10 requested
23/04/16 22:06:45 INFO BlockManagerMasterEndpoint: Trying to remove executor 10 from BlockManagerMaster.

我认为,由于起诉人可能无法与司机适当沟通,这一问题很可能是由网络配置不正确造成的。 然而,我对如何解决这一问题不确定。

我试图在提交申请的集装箱中向1 0001至1 0005号港口披露,并用这些港口作为<条码>星号>、driver.port、<条码>、>条码>、<条码>、<条码>和<条码>发射.driver.blockManager.port的数值。

此外,我还尝试配置了<条码>的花园。 然而,造成以下错误:

23/04/27 17:47:10 ERROR SparkContext: Error initializing SparkContext.
java.net.BindException: Cannot assign requested address: Service  sparkDriver  failed after 16 retries (starting from 10003)!

我也做了上述试验,在同一个机器上操作组群(用cker堆堆堆堆堆堆堆积)和管着火园的箱子(用 do子管线/火焰)的箱子,设置了<条码>,向该机的主机(通过机号——I)投放,但错误依然存在。

我的问题是,如何召集分组和(或)单个集装箱解决这一问题?

问题回答

extra_hosts feature will help you.
And here is a list of mandatory for your goal,

  • spark.hostname
  • spark.environment.SPARK_LOCAL_HOSTNAME
  • spark.environment.SPARK_WORKER_WEBUI_PORT
  • spark.environment.SPARK_WORKER_PORT
  • spark.environment.SPARK_PUBLIC_DNS

让我们拥有2台服务器;172.0.0.2(花园),172.0.0.3(花园工)。

# at spark-maser `docker-compose.yaml`
version:  3.7 

services:
  spark:
    image: bitnami/spark:3.3.2
    hostname: spark
    environment:
      - SPARK_MODE=master
      - SPARK_LOCAL_HOSTNAME=spark
    ports:
      -  8081:8080 
      -  7077:7077 
    extra_hosts:
      - "spark:172.0.0.2"
      - "spark-worker:172.0.0.3"

# at spark-worker `docker-compose.yaml`
version:  3.7 

services:
  spark-worker:
    image: bitnami/spark:3.3.2
    hostname: spark-worker
    environment:
      - SPARK_MODE=worker
      - SPARK_MASTER_URL=spark://spark:7077
      - SPARK_WORKER_MEMORY=1G
      - SPARK_EXECUTOR_MEMORY=1G
      - SPARK_WORKER_CORES=2
      - SPARK_LOCAL_HOSTNAME=spark-worker
      - SPARK_WORKER_WEBUI_PORT=8080
      - SPARK_WORKER_PORT=8180
      - SPARK_PUBLIC_DNS=172.0.0.3  # for SPARK_WORKER_WEBUI_IP
    ports:
      -  8080:8080            # for SPARK_WORKER_WEBUI_PORT
      -  8180-8280:8180-8280  # for SPARK_WORKER_PORT
    volumes:
      - ${PWD}/tmp:/data      # for test run
    extra_hosts:
      - "spark:172.0.0.2"
      - "spark-worker:172.0.0.3"


Please keep track logs of master and worker. If configuration is wrong, each logs will complain what item is wrong.

extra_hosts will update container s /etc/hosts. this is important because of while running driver and running executor need to know where is spark://spark:7077.

Please adjust port range based on how many executor run on each container. I leave link for further information for it.

参考资料链接 感谢大家!


如果你需要很好地核实工作,就应采取步骤。

  • create directory tmp on path where located in docker-compose.yaml
  • create count.py on tmp
    try:
      from pyspark import SparkContext, SparkConf
      from operator import add
    except Exception as e:
      print(e)
    
    def get_counts():
      words = "test test"
      conf = SparkConf().setAppName( letter count )
      sc = SparkContext(conf=conf)
      seq = words.split()
      data = sc.parallelize(seq)
      counts = data.map(lambda word: (word, 1)).reduceByKey(add).collect()
      sc.stop()
      print( 
    {0}
     .format(dict(counts)))
    
    
    if __name__ == "__main__":
      get_counts()
    
    
  • run it on "spark-worker" node with
    docker exec {containerID} bin/spark-submit --master spark://spark:7077 --class endpoint /data/count.py




相关问题
Unable to connect to docker container inside windows server

As title. I am developing a system with many docker images with ASP.Net MVC core projects. I am publishing these docker images into a docker engine installed on Windows Server OS, and I found that I ...

Only can see postgreSQL as an admin

After installed Wsl and Docker, can t access PSQL and other things. I was studying Docker, and installed the latest version. So far so good, I received an error about the WSL version, saw some ...

make docker-compose block until healthy without `depends_on`

I am working with a team that uses docker-compose to start a set of helper services, but does not use docker when the code is being developed. Here, docker-compose is just a convenient way to run the ...

change grafana port using docker host network

I am trying to spin up a grafana container in docker, however my setup in docker does not allow ipv4 forwarding and thus I cannot use the default bridge network in docker. All I can use is the host ...

Pip instalation for Python 3.11 in docker

I have Dockerfile to build image for Django project that uses Python 3.11 and runs in poetry environment (see content below). Problem is when I try to donwload get-pip.py file and run it using Python, ...

在 Dockerfile 中运行 composer install

我正在尝试将我的Laravel应用程序进行Docker化。 该应用程序已经构建并在Git中,但我将vendor文件夹添加到了.gitignore中。 我添加了一个Dockerfile,看起来像这样: FROM php:7.1-fpm-alpine RUN apk update ...

热门标签