Question

This question already has an answer here:

pandas split by last delimiter (1 answer)

Closed 2 hours ago.

页: 1

Below, I have a df with Value containing various combinations. I want to split the col into two individuals columns, whereby, everything before the last - and uppercase letters.

页: 1

df = pd.DataFrame({
    Value : [ Juan-Diva - HOLLS ,  Carlos - George - ESTE BAN - BOM ,  Javier Plain - Hotham Ham - ALPINE ,  Yul - KONJ KOL MON ],
   })

备选办法 1 P-4, 1 P-3, 1 FS, 1 NS

df[[ First ,  l ]] = df[ Value ].str.split(  -  , n=1, expand=True)

df[ Last ] = df[ Value ].str.split( -  ).str[-1]

备选案文2

# Regular expression pattern
pattern = r ^(.*) - ([A-Zs]+)$ 

# Extract groups into two new columns
df[[ First ,  Last ]] = df[ Value ].str.extract(pattern)

option 3)

df[["First", "Last"]] = df["Value"].str.rsplit(" - ", n=1, expand=True)

None of these options return the intended output.

预期产出:

                       First            Last
0                  Juan-Diva           HOLLS
1            Carlos - George  ESTE BAN - BOM
2  Javier Plain - Hotham Ham          ALPINE
3                        Yul    KONJ KOL MON

Answer 1

Using Pandas built-in vectorized string operations

import pandas as pd

df = pd.DataFrame({
    Value : [ Juan-Diva - HOLLS ,  Carlos - George - ESTE BAN ,  Javier Plain - Hotham Ham - ALPINE ,  Yul - KONJ KOL MON ],
})

# Regular expression pattern
pattern = r ^(.*) - ([A-Zs]+)$ 

# Extract groups into two new columns
df[[ First ,  Last ]] = df[ Value ].str.extract(pattern)

# Display the DataFrame
print(df)

产出:

                                Value                      First          Last
0                   Juan-Diva - HOLLS                  Juan-Diva         HOLLS
1          Carlos - George - ESTE BAN            Carlos - George      ESTE BAN
2  Javier Plain - Hotham Ham - ALPINE  Javier Plain - Hotham Ham        ALPINE
3                  Yul - KONJ KOL MON                        Yul  KONJ KOL MON

在这项法典中,经常表述为<代码>r ^(*)-([A-Zs]+]$>。模式涉及两个群体:

(.*) captures everything before the last " - ".
([A-Zs]+)$ captures the last uppercase string following " - ".

<代码>.str.extract(>> 方法之后,根据这些捕获组建立了两个栏目的数据框架。

www.un.org/Depts/DGACM/index_spanish.htm 替代方法:re

import pandas as pd
import re

df = pd.DataFrame({
     Value : [ Juan-Diva - HOLLS ,  Carlos - George - ESTE BAN ,  Javier Plain - Hotham Ham - ALPINE ,  Yul - KONJ KOL MON ],
})

# Function to split the string
def split_value(s):
    # Find the last occurrence of   -   followed by uppercase letters
    match = re.search(r (.*) - ([A-Zs]+)$ , s)
    if match:
        return match.group(1), match.group(2)
    else:
        return s, None

# Apply the function to each row in  Value  column
df[[ First ,  Last ]] = df[ Value ].apply(lambda x: split_value(x)).tolist()

print(df)

产出:

                                Value                      First          Last
0                   Juan-Diva - HOLLS                  Juan-Diva         HOLLS
1          Carlos - George - ESTE BAN            Carlos - George      ESTE BAN
2  Javier Plain - Hotham Ham - ALPINE  Javier Plain - Hotham Ham        ALPINE
3                  Yul - KONJ KOL MON                        Yul  KONJ KOL MON

使用<代码>re.search的Im,根据具体模式找到对应办法。 www.un.org/chinese/ga/president 如果发现对应数据,则该功能将分机的两个部分;否则,该功能将回到最后一栏的原座和<代码>None。然后将结果转化为清单,并分配给第一栏和最后一栏。

Answer 2

缩略语

df[["First", "Last"]] = df["Value"].str.rsplit(" - ", n=1, expand=True)
print(df)

印刷:

                                Value                      First          Last
0                   Juan-Diva - HOLLS                  Juan-Diva         HOLLS
1          Carlos - George - ESTE BAN            Carlos - George      ESTE BAN
2  Javier Plain - Hotham Ham - ALPINE  Javier Plain - Hotham Ham        ALPINE
3                  Yul - KONJ KOL MON                        Yul  KONJ KOL MON

友情链接