在列表的 Pandas 数据框列中查找最大值

Find max in Pandas dataframe column of lists

我有一个数据框 (df):

df = pd.DataFrame({'A' : [54321, 'it is 54322', 'is it 54323 or 4?', np.NaN]})

我能找到里面的数字:

df['B'] = df.A.replace(regex={'[^\w]':'','^\D+':'','\D+':' '}).str.split('\s')

                   A           B
0              54321         NaN
1        it is 54322     [54322]
2  is it 54323 or 4?  [54323, 4]
3                NaN         NaN

但是当我试图找到每一行的最大数字时:

df['C'] = df['B'].apply(lambda x : max(x))

我得到:

TypeError: 'float' object is not iterable

将 lambda 函数与 if-else 一起使用,还添加了转换为整数以获得正确的 max:

f = lambda x : max(int(y) for y in x) if isinstance(x, list) else np.nan
df['C'] = df['B'].apply(f)
print (df)
                   A           B        C
0              54321         NaN      NaN
1        it is 54322     [54322]  54322.0
2  is it 54323 or 4?  [54323, 4]  54323.0
3                NaN         NaN      NaN

或将 Series.str.extractall 用于 MultiIndex 并转换为 int 并在第一级使用 max

df = pd.DataFrame({'A' : [54321, 'it is 54322', 'is it 54323 or 4?', np.NaN]})
df['C'] = df.A.astype(str).str.extractall('(\d+)').astype(int).max(level=0)
print (df)
                   A        C
0              54321  54321.0
1        it is 54322  54322.0
2  is it 54323 or 4?  54323.0
3                NaN      NaN

另一个解决方案:

import re
df['B'] = df['A'].apply(lambda x: pd.Series(re.findall(r'\d+', str(x))).astype(float).max())
print(df)

打印:

                   A        B
0              54321  54321.0
1        it is 54322  54322.0
2  is it 54323 or 4?  54323.0
3                NaN      NaN