English 中文(简体)
How to Convert PATINDEX sql operation into pysstartsql
原标题:How to Convert PATINDEX sql operation into pyspark sql engine

我不得不将以下的Kallkry声明移至Kalls花园。

select case when LEN(travel_time) = 1 then
concat( 0  , SUBSTRING(travel_time,PATINDEX( % ^0 % , travel_time ), LEN(travel_time)+1), :00:00 )
else concat( SUBSTRING(travel_time,PATINDEX( % ^0  , travel_time ), LEN(travel_time)+1), :00:00 ) end as travel_time from sampleTable

“enterography

除这一产出外,即提供解决办法的实际MSQ,我们需要类似于PATINDEX运行的花园。

问题回答

I have tried the below approach in pyspark:

from pyspark.sql.functions import concat, lpad, regexp_extract, length, when
dilip_df = spark.createDataFrame(
    [("1:15:00",), ("07:45:00",), ("0 2:23:00",), ("0 6:50:00",)],
    [ travel_time ]
)
dilip_df.select(
    concat(
        when(
            length(dilip_df[ travel_time ]) == 1,
            concat(
                lpad(
                    regexp_extract(dilip_df[ travel_time ], r d+ , 0),
                    2,  0 
                ),
                regexp_extract(dilip_df[ travel_time ], r :d+:d+ , 0)
            )
        ).otherwise(
            regexp_extract(dilip_df[ travel_time ], r d:?d:d+ , 0)
        )
    ).alias( travel_time )
).show()
+-----------+
|travel_time|
+-----------+
|      15:00|
|      07:45|
|      23:00|
|      50:00|
+-----------+

以上代码:travel_time

If the length of travel_time is 1, it means there is only a single digit so

Extract the numeric part from the travel_time column using regexp_extract. Left pad the extracted numeric part with zeros to make it a two-digit number using lpad. Concatenate the padded numeric part with the rest of the original time string obtained by using regexp_extract. If the length is not 1, it means there is more than one digit so Extract the time part using regexp_extract. The final result is an alias column named travel_time containing the transformed values.

采用代号:

from pyspark.sql.functions import expr
dilip_df = spark.createDataFrame(
    [("1:15:00",), ("07:45:00",), ("02:23:00",), ("06:50:00",)],
    [ travel_time ]
)
dilip_df.createOrReplaceTempView("dilip_table")
query = """
    SELECT 
        travel_time,
        CASE 
            WHEN REGEXP_EXTRACT(travel_time,  ^\d{1,2}:\d{2}:\d{2}$ ) IS NOT NULL THEN  HH:MM:SS 
            WHEN REGEXP_EXTRACT(travel_time,  ^0\d{1}:\d{2}:\d{2}$ ) IS NOT NULL THEN  0H:MM:SS 
            WHEN REGEXP_EXTRACT(travel_time,  ^0\d{1} \d{1,2}:\d{2}:\d{2}$ ) IS NOT NULL THEN  0H HM:SS 
            ELSE NULL 
        END AS format
    FROM dilip_table
"""
result = spark.sql(query)
result.show()

In spark.sql code It selects the travel_time column with transformations applied using the CONCAT and CASE WHEN expressions based on the length of the travel_time column. If the length is 1, it extracts and manipulates the numeric part of the travel_time using regexp_extract, lpad, and concat. If the length is not 1, it extracts the time part using regexp_extract.

<>Results>

+-----------+--------+
|travel_time|  format|
+-----------+--------+
|    1:15:00|HH:MM:SS|
|   07:45:00|HH:MM:SS|
|   02:23:00|HH:MM:SS|
|   06:50:00|HH:MM:SS|
+-----------+--------+




相关问题
Get webpage contents with Python?

I m using Python 3.1, if that helps. Anyways, I m trying to get the contents of this webpage. I Googled for a little bit and tried different things, but they didn t work. I m guessing that this ...

What is internal representation of string in Python 3.x

In Python 3.x, a string consists of items of Unicode ordinal. (See the quotation from the language reference below.) What is the internal representation of Unicode string? Is it UTF-16? The items ...

What does Python s builtin __build_class__ do?

In Python 3.1, there is a new builtin function I don t know in the builtins module: __build_class__(...) __build_class__(func, name, *bases, metaclass=None, **kwds) -> class Internal ...

what functional tools remain in Python 3k?

I have have read several entries regarding dropping several functional functions from future python, including map and reduce. What is the official policy regarding functional extensions? is lambda ...

Building executables for Python 3 and PyQt

I built a rather simple application in Python 3.1 using PyQt4. Being done, I want the application to be distributed to computers without either of those installed. I almost exclusively care about ...

热门标签