Question

我试图利用以下码头组合混凝土。 yaml文档:

spark:
    image: bitnami/spark:3.3.2
    environment:
      - SPARK_MODE=master
    ports:
      -  8081:8080 
      -  7077:7077 
spark-worker:
  image: bitnami/spark:3.3.2
  environment:
    - SPARK_MODE=worker
    - SPARK_MASTER_URL=spark://spark:7077
    - SPARK_WORKER_MEMORY=4G
    - SPARK_EXECUTOR_MEMORY=4G
    - SPARK_WORKER_CORES=4
  ports:
    -  8082:8081

想法是,继续把主机(主机和工人)集中起来,准备从其他机器的集装箱那里获得假冒的工作机会。

例如:

The machine A is hosting the cluster (spark+worker), created using docker-compose up
In machine B, I will create a spark container to run python job using spark-submit --master spark://:7077
In the machine C, I have a Airflow worker container also trying to run a python job using spark-submit
A jupyter notebook container creates a spark session to perform interactive computations.

我已成功地在一台物理机器上设立了我的“轮船”组。总经理能够适当确定工人节点,在组别内(从主子和工人节)提供的工作进展顺利。

However, I encountered an issue when attempting to submit a job from a container running in other machine. While the job appears in the master s UI, it becomes stuck in an infinite loop with executors being requested, created, and destroyed repeatedly.

23/04/16 22:06:39 INFO BlockManagerMaster: Removal of executor 9 requested
23/04/16 22:06:39 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asked to remove non-existent executor 9
23/04/16 22:06:39 INFO BlockManagerMasterEndpoint: Trying to remove executor 9 from BlockManagerMaster.
23/04/16 22:06:39 INFO StandaloneSchedulerBackend: Granted executor ID app-20230416220604-0001/11 on hostPort 10.18.0.130:45783 with 1 core(s), 1024.0 MiB RAM
23/04/16 22:06:39 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20230416220604-0001/11 is now RUNNING
23/04/16 22:06:45 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20230416220604-0001/10 is now EXITED (Command exited with code 1)
23/04/16 22:06:45 INFO StandaloneSchedulerBackend: Executor app-20230416220604-0001/10 removed: Command exited with code 1
23/04/16 22:06:45 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20230416220604-0001/12 on worker-20230416220203-10.18.0.73-36107 (10.18.0.73:36107) with 1 core(s)
23/04/16 22:06:45 INFO StandaloneSchedulerBackend: Granted executor ID app-20230416220604-0001/12 on hostPort 10.18.0.73:36107 with 1 core(s), 1024.0 MiB RAM
23/04/16 22:06:45 INFO BlockManagerMaster: Removal of executor 10 requested
23/04/16 22:06:45 INFO BlockManagerMasterEndpoint: Trying to remove executor 10 from BlockManagerMaster.

我认为,由于起诉人可能无法与司机适当沟通,这一问题很可能是由网络配置不正确造成的。然而,我对如何解决这一问题不确定。

我试图在提交申请的集装箱中向1 0001至1 0005号港口披露,并用这些港口作为<条码>星号>、driver.port、<条码>、>条码>、<条码>、<条码>和<条码>发射.driver.blockManager.port的数值。

此外,我还尝试配置了<条码>的花园。然而,造成以下错误:

23/04/27 17:47:10 ERROR SparkContext: Error initializing SparkContext.
java.net.BindException: Cannot assign requested address: Service  sparkDriver  failed after 16 retries (starting from 10003)!

我也做了上述试验,在同一个机器上操作组群(用cker堆堆堆堆堆堆堆积)和管着火园的箱子(用 do子管线/火焰)的箱子,设置了<条码>,向该机的主机(通过机号——I)投放,但错误依然存在。

我的问题是,如何召集分组和(或)单个集装箱解决这一问题?

Answer 1

extra_hosts feature will help you.
And here is a list of mandatory for your goal,

spark.hostname
spark.environment.SPARK_LOCAL_HOSTNAME
spark.environment.SPARK_WORKER_WEBUI_PORT
spark.environment.SPARK_WORKER_PORT
spark.environment.SPARK_PUBLIC_DNS

让我们拥有2台服务器;172.0.0.2(花园),172.0.0.3(花园工)。

# at spark-maser `docker-compose.yaml`
version:  3.7 

services:
  spark:
    image: bitnami/spark:3.3.2
    hostname: spark
    environment:
      - SPARK_MODE=master
      - SPARK_LOCAL_HOSTNAME=spark
    ports:
      -  8081:8080 
      -  7077:7077 
    extra_hosts:
      - "spark:172.0.0.2"
      - "spark-worker:172.0.0.3"

# at spark-worker `docker-compose.yaml`
version:  3.7 

services:
  spark-worker:
    image: bitnami/spark:3.3.2
    hostname: spark-worker
    environment:
      - SPARK_MODE=worker
      - SPARK_MASTER_URL=spark://spark:7077
      - SPARK_WORKER_MEMORY=1G
      - SPARK_EXECUTOR_MEMORY=1G
      - SPARK_WORKER_CORES=2
      - SPARK_LOCAL_HOSTNAME=spark-worker
      - SPARK_WORKER_WEBUI_PORT=8080
      - SPARK_WORKER_PORT=8180
      - SPARK_PUBLIC_DNS=172.0.0.3  # for SPARK_WORKER_WEBUI_IP
    ports:
      -  8080:8080            # for SPARK_WORKER_WEBUI_PORT
      -  8180-8280:8180-8280  # for SPARK_WORKER_PORT
    volumes:
      - ${PWD}/tmp:/data      # for test run
    extra_hosts:
      - "spark:172.0.0.2"
      - "spark-worker:172.0.0.3"

Please keep track logs of master and worker. If configuration is wrong, each logs will complain what item is wrong.

extra_hosts will update container s /etc/hosts. this is important because of while running driver and running executor need to know where is spark://spark:7077.

Please adjust port range based on how many executor run on each container. I leave link for further information for it.

参考资料链接感谢大家!

如果你需要很好地核实工作,就应采取步骤。

create directory tmp on path where located in docker-compose.yaml

create count.py on tmp

try:
  from pyspark import SparkContext, SparkConf
  from operator import add
except Exception as e:
  print(e)

def get_counts():
  words = "test test"
  conf = SparkConf().setAppName( letter count )
  sc = SparkContext(conf=conf)
  seq = words.split()
  data = sc.parallelize(seq)
  counts = data.map(lambda word: (word, 1)).reduceByKey(add).collect()
  sc.stop()
  print( 
{0}
 .format(dict(counts)))


if __name__ == "__main__":
  get_counts()

run it on "spark-worker" node with
docker exec {containerID} bin/spark-submit --master spark://spark:7077 --class endpoint /data/count.py

友情链接