Question

我不得不将以下的Kallkry声明移至Kalls花园。

select case when LEN(travel_time) = 1 then
concat( 0  , SUBSTRING(travel_time,PATINDEX( % ^0 % , travel_time ), LEN(travel_time)+1), :00:00 )
else concat( SUBSTRING(travel_time,PATINDEX( % ^0  , travel_time ), LEN(travel_time)+1), :00:00 ) end as travel_time from sampleTable

除这一产出外,即提供解决办法的实际MSQ,我们需要类似于PATINDEX运行的花园。

Answer 1

I have tried the below approach in pyspark:

from pyspark.sql.functions import concat, lpad, regexp_extract, length, when
dilip_df = spark.createDataFrame(
    [("1:15:00",), ("07:45:00",), ("0 2:23:00",), ("0 6:50:00",)],
    [ travel_time ]
)
dilip_df.select(
    concat(
        when(
            length(dilip_df[ travel_time ]) == 1,
            concat(
                lpad(
                    regexp_extract(dilip_df[ travel_time ], r d+ , 0),
                    2,  0 
                ),
                regexp_extract(dilip_df[ travel_time ], r :d+:d+ , 0)
            )
        ).otherwise(
            regexp_extract(dilip_df[ travel_time ], r d:?d:d+ , 0)
        )
    ).alias( travel_time )
).show()

+-----------+
|travel_time|
+-----------+
|      15:00|
|      07:45|
|      23:00|
|      50:00|
+-----------+

以上代码:travel_time栏

If the length of travel_time is 1, it means there is only a single digit so

Extract the numeric part from the travel_time column using regexp_extract. Left pad the extracted numeric part with zeros to make it a two-digit number using lpad. Concatenate the padded numeric part with the rest of the original time string obtained by using regexp_extract. If the length is not 1, it means there is more than one digit so Extract the time part using regexp_extract. The final result is an alias column named travel_time containing the transformed values.

采用代号:

from pyspark.sql.functions import expr
dilip_df = spark.createDataFrame(
    [("1:15:00",), ("07:45:00",), ("02:23:00",), ("06:50:00",)],
    [ travel_time ]
)
dilip_df.createOrReplaceTempView("dilip_table")
query = """
    SELECT 
        travel_time,
        CASE 
            WHEN REGEXP_EXTRACT(travel_time,  ^\d{1,2}:\d{2}:\d{2}$ ) IS NOT NULL THEN  HH:MM:SS 
            WHEN REGEXP_EXTRACT(travel_time,  ^0\d{1}:\d{2}:\d{2}$ ) IS NOT NULL THEN  0H:MM:SS 
            WHEN REGEXP_EXTRACT(travel_time,  ^0\d{1} \d{1,2}:\d{2}:\d{2}$ ) IS NOT NULL THEN  0H HM:SS 
            ELSE NULL 
        END AS format
    FROM dilip_table
"""
result = spark.sql(query)
result.show()

In spark.sql code It selects the travel_time column with transformations applied using the CONCAT and CASE WHEN expressions based on the length of the travel_time column. If the length is 1, it extracts and manipulates the numeric part of the travel_time using regexp_extract, lpad, and concat. If the length is not 1, it extracts the time part using regexp_extract.

<>Results>

+-----------+--------+
|travel_time|  format|
+-----------+--------+
|    1:15:00|HH:MM:SS|
|   07:45:00|HH:MM:SS|
|   02:23:00|HH:MM:SS|
|   06:50:00|HH:MM:SS|
+-----------+--------+

友情链接