我试图利用以下码头组合混凝土。 yaml文档:
spark:
image: bitnami/spark:3.3.2
environment:
- SPARK_MODE=master
ports:
- 8081:8080
- 7077:7077
spark-worker:
image: bitnami/spark:3.3.2
environment:
- SPARK_MODE=worker
- SPARK_MASTER_URL=spark://spark:7077
- SPARK_WORKER_MEMORY=4G
- SPARK_EXECUTOR_MEMORY=4G
- SPARK_WORKER_CORES=4
ports:
- 8082:8081
想法是,继续把主机(主机和工人)集中起来,准备从其他机器的集装箱那里获得假冒的工作机会。
例如:
- The machine A is hosting the cluster (spark+worker), created using
docker-compose up
- In machine B, I will create a spark container to run python job using spark-submit --master spark://:7077
- In the machine C, I have a Airflow worker container also trying to run a python job using spark-submit
- A jupyter notebook container creates a spark session to perform interactive computations.
我已成功地在一台物理机器上设立了我的“轮船”组。 总经理能够适当确定工人节点,在组别内(从主子和工人节)提供的工作进展顺利。
However, I encountered an issue when attempting to submit a job from a container running in other machine. While the job appears in the master s UI, it becomes stuck in an infinite loop with executors being requested, created, and destroyed repeatedly.
23/04/16 22:06:39 INFO BlockManagerMaster: Removal of executor 9 requested
23/04/16 22:06:39 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asked to remove non-existent executor 9
23/04/16 22:06:39 INFO BlockManagerMasterEndpoint: Trying to remove executor 9 from BlockManagerMaster.
23/04/16 22:06:39 INFO StandaloneSchedulerBackend: Granted executor ID app-20230416220604-0001/11 on hostPort 10.18.0.130:45783 with 1 core(s), 1024.0 MiB RAM
23/04/16 22:06:39 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20230416220604-0001/11 is now RUNNING
23/04/16 22:06:45 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20230416220604-0001/10 is now EXITED (Command exited with code 1)
23/04/16 22:06:45 INFO StandaloneSchedulerBackend: Executor app-20230416220604-0001/10 removed: Command exited with code 1
23/04/16 22:06:45 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20230416220604-0001/12 on worker-20230416220203-10.18.0.73-36107 (10.18.0.73:36107) with 1 core(s)
23/04/16 22:06:45 INFO StandaloneSchedulerBackend: Granted executor ID app-20230416220604-0001/12 on hostPort 10.18.0.73:36107 with 1 core(s), 1024.0 MiB RAM
23/04/16 22:06:45 INFO BlockManagerMaster: Removal of executor 10 requested
23/04/16 22:06:45 INFO BlockManagerMasterEndpoint: Trying to remove executor 10 from BlockManagerMaster.
我认为,由于起诉人可能无法与司机适当沟通,这一问题很可能是由网络配置不正确造成的。 然而,我对如何解决这一问题不确定。
我试图在提交申请的集装箱中向1 0001至1 0005号港口披露,并用这些港口作为<条码>星号>、driver.port条码>、<条码>、>条码>、<条码>、<条码>和<条码>发射.driver.blockManager.port的数值。
此外,我还尝试配置了<条码>的花园。 然而,造成以下错误:
23/04/27 17:47:10 ERROR SparkContext: Error initializing SparkContext.
java.net.BindException: Cannot assign requested address: Service sparkDriver failed after 16 retries (starting from 10003)!
我也做了上述试验,在同一个机器上操作组群(用cker堆堆堆堆堆堆堆积)和管着火园的箱子(用 do子管线/火焰)的箱子,设置了<条码>,向该机的主机(通过机号——I)投放,但错误依然存在。
我的问题是,如何召集分组和(或)单个集装箱解决这一问题?