在列表的 Pandas 数据框列中查找最大值
Find max in Pandas dataframe column of lists
我有一个数据框 (df):
df = pd.DataFrame({'A' : [54321, 'it is 54322', 'is it 54323 or 4?', np.NaN]})
我能找到里面的数字:
df['B'] = df.A.replace(regex={'[^\w]':'','^\D+':'','\D+':' '}).str.split('\s')
A B
0 54321 NaN
1 it is 54322 [54322]
2 is it 54323 or 4? [54323, 4]
3 NaN NaN
但是当我试图找到每一行的最大数字时:
df['C'] = df['B'].apply(lambda x : max(x))
我得到:
TypeError: 'float' object is not iterable
将 lambda 函数与 if-else
一起使用,还添加了转换为整数以获得正确的 max
:
f = lambda x : max(int(y) for y in x) if isinstance(x, list) else np.nan
df['C'] = df['B'].apply(f)
print (df)
A B C
0 54321 NaN NaN
1 it is 54322 [54322] 54322.0
2 is it 54323 or 4? [54323, 4] 54323.0
3 NaN NaN NaN
或将 Series.str.extractall
用于 MultiIndex
并转换为 int
并在第一级使用 max
:
df = pd.DataFrame({'A' : [54321, 'it is 54322', 'is it 54323 or 4?', np.NaN]})
df['C'] = df.A.astype(str).str.extractall('(\d+)').astype(int).max(level=0)
print (df)
A C
0 54321 54321.0
1 it is 54322 54322.0
2 is it 54323 or 4? 54323.0
3 NaN NaN
另一个解决方案:
import re
df['B'] = df['A'].apply(lambda x: pd.Series(re.findall(r'\d+', str(x))).astype(float).max())
print(df)
打印:
A B
0 54321 54321.0
1 it is 54322 54322.0
2 is it 54323 or 4? 54323.0
3 NaN NaN
我有一个数据框 (df):
df = pd.DataFrame({'A' : [54321, 'it is 54322', 'is it 54323 or 4?', np.NaN]})
我能找到里面的数字:
df['B'] = df.A.replace(regex={'[^\w]':'','^\D+':'','\D+':' '}).str.split('\s')
A B
0 54321 NaN
1 it is 54322 [54322]
2 is it 54323 or 4? [54323, 4]
3 NaN NaN
但是当我试图找到每一行的最大数字时:
df['C'] = df['B'].apply(lambda x : max(x))
我得到:
TypeError: 'float' object is not iterable
将 lambda 函数与 if-else
一起使用,还添加了转换为整数以获得正确的 max
:
f = lambda x : max(int(y) for y in x) if isinstance(x, list) else np.nan
df['C'] = df['B'].apply(f)
print (df)
A B C
0 54321 NaN NaN
1 it is 54322 [54322] 54322.0
2 is it 54323 or 4? [54323, 4] 54323.0
3 NaN NaN NaN
或将 Series.str.extractall
用于 MultiIndex
并转换为 int
并在第一级使用 max
:
df = pd.DataFrame({'A' : [54321, 'it is 54322', 'is it 54323 or 4?', np.NaN]})
df['C'] = df.A.astype(str).str.extractall('(\d+)').astype(int).max(level=0)
print (df)
A C
0 54321 54321.0
1 it is 54322 54322.0
2 is it 54323 or 4? 54323.0
3 NaN NaN
另一个解决方案:
import re
df['B'] = df['A'].apply(lambda x: pd.Series(re.findall(r'\d+', str(x))).astype(float).max())
print(df)
打印:
A B
0 54321 54321.0
1 it is 54322 54322.0
2 is it 54323 or 4? 54323.0
3 NaN NaN