English 中文(简体)
我应当使用哪些火花和三角形?
原标题:Which combination of pyspark and delta versions should I use?

I m using the jupyter/pyspark-notebook docker image to develop a spark script. My dockerfile looks like this:

FROM jupyter/pyspark-notebook

USER root
COPY requirements.txt .

RUN pip install --no-cache-dir -r requirements.txt && rm requirements.txt

# this is a default user and the image is configured to use it
ARG NB_USER=jovyan
ARG NB_UID=1000
ARG NB_GID=100

ENV USER ${NB_USER}
ENV HOME /home/${NB_USER}
RUN groupadd -f ${USER} && 
    chown -R ${USER}:${USER} ${HOME}

USER ${NB_USER}

RUN export PACKAGES="io.delta:delta-core_2.12:1.0.0"
RUN export PYSPARK_SUBMIT_ARGS="--packages ${PACKAGES} pyspark-shell"

我的要求。 t:

delta-spark==2.1.0
deltalake==0.10.1
jupyterlab==4.0.6
pandas==2.1.0
pyspark==3.3.3

I build and run the image via docker compose, and then attempt to run this in a notebook:

import pyspark
from delta import *

builder = pyspark.sql.SparkSession.builder.appName("LocalDelta") 
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") 
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")

spark = configure_spark_with_delta_pip(builder).getOrCreate()

并且有以下错误:

AttributeError                            Traceback (most recent call last)
Cell In[2], line 2
      1 import pyspark
----> 2 from delta import *
      4 builder = pyspark.sql.SparkSession.builder.appName("LocalDelta") 
      5     .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") 
      6     .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
      8 spark = configure_spark_with_delta_pip(builder).getOrCreate()

File /opt/conda/lib/python3.11/site-packages/delta/__init__.py:17
      1 #
      2 # Copyright (2021) The Delta Lake Project Authors.
      3 #
   (...)
     14 # limitations under the License.
     15 #
---> 17 from delta.tables import DeltaTable
     18 from delta.pip_utils import configure_spark_with_delta_pip
     20 __all__ = [ DeltaTable ,  configure_spark_with_delta_pip ]

File /opt/conda/lib/python3.11/site-packages/delta/tables.py:21
      1 #
      2 # Copyright (2021) The Delta Lake Project Authors.
      3 #
   (...)
     14 # limitations under the License.
     15 #
     17 from typing import (
     18     TYPE_CHECKING, cast, overload, Any, Iterable, Optional, Union, NoReturn, List, Tuple
     19 )
---> 21 import delta.exceptions  # noqa: F401; pylint: disable=unused-variable
     22 from delta._typing import (
     23     ColumnMapping, OptionalColumnMapping, ExpressionOrColumn, OptionalExpressionOrColumn
     24 )
     26 from pyspark import since

File /opt/conda/lib/python3.11/site-packages/delta/exceptions.py:166
    162     utils.convert_exception = convert_delta_exception
    165 if not _delta_exception_patched:
--> 166     _patch_convert_exception()
    167     _delta_exception_patched = True

File /opt/conda/lib/python3.11/site-packages/delta/exceptions.py:154, in _patch_convert_exception()
    149 def _patch_convert_exception() -> None:
    150     """
    151     Patch PySpark s exception convert method to convert Delta s Scala concurrent exceptions to the
    152     corresponding Python exceptions.
    153     """
--> 154     original_convert_sql_exception = utils.convert_exception
    156     def convert_delta_exception(e: "JavaObject") -> CapturedException:
    157         delta_exception = _convert_delta_exception(e)

AttributeError: module  pyspark.sql.utils  has no attribute  convert_exception 

看来,火花和三角洲两种版本之间似乎互不兼容,但我没有能够找到任何关于 st流或其它任何地方的任何东西来向我指明正确的方向。 我以这个例子为依据:

Any help would be much appreciated.

问题回答

Your delta-spark==2.1.0 version has to match the version of the jar added via --packages. So set: RUN export PACKAGES="io.delta:delta-core_2.12:2.1.0"





相关问题
Unable to connect to docker container inside windows server

As title. I am developing a system with many docker images with ASP.Net MVC core projects. I am publishing these docker images into a docker engine installed on Windows Server OS, and I found that I ...

Only can see postgreSQL as an admin

After installed Wsl and Docker, can t access PSQL and other things. I was studying Docker, and installed the latest version. So far so good, I received an error about the WSL version, saw some ...

make docker-compose block until healthy without `depends_on`

I am working with a team that uses docker-compose to start a set of helper services, but does not use docker when the code is being developed. Here, docker-compose is just a convenient way to run the ...

change grafana port using docker host network

I am trying to spin up a grafana container in docker, however my setup in docker does not allow ipv4 forwarding and thus I cannot use the default bridge network in docker. All I can use is the host ...

Pip instalation for Python 3.11 in docker

I have Dockerfile to build image for Django project that uses Python 3.11 and runs in poetry environment (see content below). Problem is when I try to donwload get-pip.py file and run it using Python, ...

在 Dockerfile 中运行 composer install

我正在尝试将我的Laravel应用程序进行Docker化。 该应用程序已经构建并在Git中,但我将vendor文件夹添加到了.gitignore中。 我添加了一个Dockerfile,看起来像这样: FROM php:7.1-fpm-alpine RUN apk update ...

热门标签