如何将 Pandas 数据框中的列拆分为字母值和数值?
How to split a column into alphabetic values and numeric values from a column in a Pandas dataframe?
我有一个数据框:
Name Section
1 James P3
2 Sam 2.5C
3 Billy T35
4 Sarah A85
5 Felix 5I
如何将数值拆分为名为 Section_Number 的单独列,并将字母值拆分为 Section_Letter。
期望的结果
Name Section Section_Number Section_Letter
1 James P3 3 P
2 Sam 2.5C 2.5 C
3 Billy T35 35 T
4 Sarah A85 85 A
5 Felix 5L 5 L
对所有大写字符串使用 str.replace
with str.extract
by [A-Z]+
:
df['Section_Number'] = df['Section'].str.replace('([A-Z]+)', '')
df['Section_Letter'] = df['Section'].str.extract('([A-Z]+)')
print (df)
Name Section Section_Number Section_Letter
1 James P3 3 P
2 Sam 2.5C 2.5 C
3 Billy T35 35 T
4 Sarah A85 85 A
5 Felix 5I 5 I
对于也选择小写值:
df['Section_Number'] = df['Section'].str.replace('([A-Za-z]+)', '')
df['Section_Letter'] = df['Section'].str.extract('([A-Za-z]+)')
print (df)
Name Section Section_Number Section_Letter
1 James P3 3 P
2 Sam 2.5C 2.5 C
3 Billy T35 35 T
4 Sarah A85 85 A
5 Felix 5I 5 I
毫无疑问,它会更慢,但为了完整性而抛出一个替代方案,您可以使用 str.extractall
来获取与模式匹配的命名组并合并匹配项并加入回您的 DF...
new = df.join(
df.Section.str.extractall(r'(?i)(?P<Section_Letter>[A-Z]+)|(?P<Section_Number>[\d.]+)')
.groupby(level=0).first()
)
结果:
Name Section Section_Letter Section_Number
1 James P3 P 3
2 Sam 2.5C C 2.5
3 Billy T35 T 35
4 Sarah A85 A 85
5 Felix 5I I 5
如果像你的例子一样,每个名字中有一个字母,你可以排序然后切片:
def get_vals(x):
return ''.join(sorted(x, key=str.isalpha))
# apply ordering
vals = df['Section'].apply(get_vals)
# split numbers from letter
df['num'] = vals.str[:-1].astype(float)
df['letter'] = vals.str[-1]
print(df)
Name Section num letter
1 James P3 3.0 P
2 Sam 2.5C 2.5 C
3 Billy T35 35.0 T
4 Sarah A85 85.0 A
5 Felix 5I 5.0 I
我们可以使用itertools.groupby
对连续的alpha和非alpha进行分组
from itertools import groupby
[sorted([''.join(x) for _, x in groupby(s, key=str.isalpha)]) for s in df.Section]
[['3', 'P'], ['2.5', 'C'], ['35', 'T'], ['85', 'A'], ['5', 'I']]
我们可以将其操作成新的列
from itertools import groupby
N, L = zip(
*[sorted([''.join(x) for _, x in groupby(s, key=str.isalpha)]) for s in df.Section]
)
df.assign(Selection_Number=N, Selection_Letter=L)
Name Section Selection_Number Selection_Letter
1 James P3 3 P
2 Sam 2.5C 2.5 C
3 Billy T35 35 T
4 Sarah A85 85 A
5 Felix 5I 5 I
我有一个数据框:
Name Section
1 James P3
2 Sam 2.5C
3 Billy T35
4 Sarah A85
5 Felix 5I
如何将数值拆分为名为 Section_Number 的单独列,并将字母值拆分为 Section_Letter。 期望的结果
Name Section Section_Number Section_Letter
1 James P3 3 P
2 Sam 2.5C 2.5 C
3 Billy T35 35 T
4 Sarah A85 85 A
5 Felix 5L 5 L
对所有大写字符串使用 str.replace
with str.extract
by [A-Z]+
:
df['Section_Number'] = df['Section'].str.replace('([A-Z]+)', '')
df['Section_Letter'] = df['Section'].str.extract('([A-Z]+)')
print (df)
Name Section Section_Number Section_Letter
1 James P3 3 P
2 Sam 2.5C 2.5 C
3 Billy T35 35 T
4 Sarah A85 85 A
5 Felix 5I 5 I
对于也选择小写值:
df['Section_Number'] = df['Section'].str.replace('([A-Za-z]+)', '')
df['Section_Letter'] = df['Section'].str.extract('([A-Za-z]+)')
print (df)
Name Section Section_Number Section_Letter
1 James P3 3 P
2 Sam 2.5C 2.5 C
3 Billy T35 35 T
4 Sarah A85 85 A
5 Felix 5I 5 I
毫无疑问,它会更慢,但为了完整性而抛出一个替代方案,您可以使用 str.extractall
来获取与模式匹配的命名组并合并匹配项并加入回您的 DF...
new = df.join(
df.Section.str.extractall(r'(?i)(?P<Section_Letter>[A-Z]+)|(?P<Section_Number>[\d.]+)')
.groupby(level=0).first()
)
结果:
Name Section Section_Letter Section_Number
1 James P3 P 3
2 Sam 2.5C C 2.5
3 Billy T35 T 35
4 Sarah A85 A 85
5 Felix 5I I 5
如果像你的例子一样,每个名字中有一个字母,你可以排序然后切片:
def get_vals(x):
return ''.join(sorted(x, key=str.isalpha))
# apply ordering
vals = df['Section'].apply(get_vals)
# split numbers from letter
df['num'] = vals.str[:-1].astype(float)
df['letter'] = vals.str[-1]
print(df)
Name Section num letter
1 James P3 3.0 P
2 Sam 2.5C 2.5 C
3 Billy T35 35.0 T
4 Sarah A85 85.0 A
5 Felix 5I 5.0 I
我们可以使用itertools.groupby
对连续的alpha和非alpha进行分组
from itertools import groupby
[sorted([''.join(x) for _, x in groupby(s, key=str.isalpha)]) for s in df.Section]
[['3', 'P'], ['2.5', 'C'], ['35', 'T'], ['85', 'A'], ['5', 'I']]
我们可以将其操作成新的列
from itertools import groupby
N, L = zip(
*[sorted([''.join(x) for _, x in groupby(s, key=str.isalpha)]) for s in df.Section]
)
df.assign(Selection_Number=N, Selection_Letter=L)
Name Section Selection_Number Selection_Letter
1 James P3 3 P
2 Sam 2.5C 2.5 C
3 Billy T35 35 T
4 Sarah A85 85 A
5 Felix 5I 5 I