在 Python 中的数据框中的一列的特定字符之前提取值
Extract values before specific characters of one column in dataframe in Python
如何从后续数据框中的 address
列中提取 area
值?
address quantity price
0 711-2880 Nulla St. Mankato Mississippi 96.5㎡ 2 20
1 P.O. Box 283 8562 Fusce Rd. Frederick Nebraska 206㎡ 3 13
2 606-3727 Ullamcorper. Street Roseville NH 115㎡ 11523 (786) 713-8616 5 23
3 Ap #867-859 Sit Rd. Azusa New York 39 square metre 3 32
4 7292 Dictum Av. San Antonio MI 470㎡ 47096 (492) 709-6392 5 45
请注意它是 ㎡
或 square metre
之前的 值 。
所需的输出将如下所示:
address area quantity price
0 711-2880 Nulla St. Mankato Mississippi 96.5㎡ 96.5 2 20
1 P.O. Box 283 8562 Fusce Rd. Frederick Nebraska 206㎡ 206.0 3 13
2 606-3727 Ullamcorper. Street Roseville NH 115㎡ 11523 (786) 713-8616 115.0 5 23
3 Ap #867-859 Sit Rd. Azusa New York 39 square metre 39.0 3 32
4 7292 Dictum Av. San Antonio MI 470㎡ 47096 (492) 709-6392 470.0 5 45
使用str.extract
例如:
df = pd.DataFrame({'address': ['711-2880 Nulla St. Mankato Mississippi 96.5㎡', 'P.O. Box 283 8562 Fusce Rd. Frederick Nebraska 206㎡', '606-3727 Ullamcorper. Street Roseville NH 115㎡ 11523 (786) 713-8616', 'Ap #867-859 Sit Rd. Azusa New York 39 square metre', '7292 Dictum Av. San Antonio MI 470㎡ 47096 (492) 709-6392']})
df['area'] = df['address'].str.extract(r"(\d+\.?\d*)\s*(?=㎡|\bsquare metre\b)")
print(df)
输出:
address area
0 711-2880 Nulla St. Mankato Mississippi 96.5㎡ 96.5
1 P.O. Box 283 8562 Fusce Rd. Frederick Nebraska... 206
2 606-3727 Ullamcorper. Street Roseville NH 115㎡... 115
3 Ap #867-859 Sit Rd. Azusa New York 39 square m... 39
4 7292 Dictum Av. San Antonio MI 470㎡ 47096 (492... 470
如何从后续数据框中的 address
列中提取 area
值?
address quantity price
0 711-2880 Nulla St. Mankato Mississippi 96.5㎡ 2 20
1 P.O. Box 283 8562 Fusce Rd. Frederick Nebraska 206㎡ 3 13
2 606-3727 Ullamcorper. Street Roseville NH 115㎡ 11523 (786) 713-8616 5 23
3 Ap #867-859 Sit Rd. Azusa New York 39 square metre 3 32
4 7292 Dictum Av. San Antonio MI 470㎡ 47096 (492) 709-6392 5 45
请注意它是 ㎡
或 square metre
之前的 值 。
所需的输出将如下所示:
address area quantity price
0 711-2880 Nulla St. Mankato Mississippi 96.5㎡ 96.5 2 20
1 P.O. Box 283 8562 Fusce Rd. Frederick Nebraska 206㎡ 206.0 3 13
2 606-3727 Ullamcorper. Street Roseville NH 115㎡ 11523 (786) 713-8616 115.0 5 23
3 Ap #867-859 Sit Rd. Azusa New York 39 square metre 39.0 3 32
4 7292 Dictum Av. San Antonio MI 470㎡ 47096 (492) 709-6392 470.0 5 45
使用str.extract
例如:
df = pd.DataFrame({'address': ['711-2880 Nulla St. Mankato Mississippi 96.5㎡', 'P.O. Box 283 8562 Fusce Rd. Frederick Nebraska 206㎡', '606-3727 Ullamcorper. Street Roseville NH 115㎡ 11523 (786) 713-8616', 'Ap #867-859 Sit Rd. Azusa New York 39 square metre', '7292 Dictum Av. San Antonio MI 470㎡ 47096 (492) 709-6392']})
df['area'] = df['address'].str.extract(r"(\d+\.?\d*)\s*(?=㎡|\bsquare metre\b)")
print(df)
输出:
address area
0 711-2880 Nulla St. Mankato Mississippi 96.5㎡ 96.5
1 P.O. Box 283 8562 Fusce Rd. Frederick Nebraska... 206
2 606-3727 Ullamcorper. Street Roseville NH 115㎡... 115
3 Ap #867-859 Sit Rd. Azusa New York 39 square m... 39
4 7292 Dictum Av. San Antonio MI 470㎡ 47096 (492... 470