如何一次在多个列上使用应用函数
How to use apply function on multiple columns at once
是否可以在 pandas 中的多个列上调用 apply 函数?如果可以,该怎么做...例如,
df['Duration'] = df['Hours', 'Mins', 'Secs'].apply(lambda x,y,z: timedelta(hours=x, minutes=y, seconds=z))
This is what the expected output should look like once everything comes together
谢谢。
在具有 axis=1
的数据帧上使用 apply
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html
triangles = [{ 'base': 20, 'height': 9 }, { 'base': 10, 'height': 7 }, { 'base': 40, 'height': 4 }]
triangles_df = pd.DataFrame(triangles)
def calculate_area(row):
return row['base'] * row['height'] * 0.5
triangles_df.apply(calculate_area, axis=1)
祝你好运!
这可能会有所帮助。
import pandas as pd
import datetime as DT
df = pd.DataFrame({"Hours": [1], "Mins": [2], "Secs": [10]})
df = df.astype(int)
df['Duration'] = df[['Hours', 'Mins', 'Secs']].apply(lambda x: DT.timedelta(hours=x[0], minutes=x[1], seconds=x[2]), axis=1)
print(df)
print(df["Duration"])
输出:
Hours Mins Secs Duration
0 1 2 10 01:02:10
0 01:02:10
dtype: timedelta64[ns]
你应该使用:
df['Duration'] = pd.to_timedelta(df.Hours*3600 + df.Mins*60 + df.Secs, unit='s')
当您在 DataFrame
和 axis=1
上使用 apply 时,它是一个行计算,因此通常这种语法是有意义的:
df['Duration'] = df.apply(lambda row: pd.Timedelta(hours=row.Hours, minutes=row.Mins,
seconds=row.Secs), axis=1)
一些时间
import pandas as pd
import numpy as np
df = pd.DataFrame({'Hours': np.tile([1,2,3,4],50),
'Mins': np.tile([10,20,30,40],50),
'Secs': np.tile([11,21,31,41],50)})
%timeit pd.to_timedelta(df.Hours*3600 + df.Mins*60 + df.Secs, unit='s')
#432 µs ± 5.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit df.apply(lambda row: pd.Timedelta(hours=row.Hours, minutes=row.Mins, seconds=row.Secs), axis=1)
#12 ms ± 67.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
一如既往,申请应该是最后的手段。
是否可以在 pandas 中的多个列上调用 apply 函数?如果可以,该怎么做...例如,
df['Duration'] = df['Hours', 'Mins', 'Secs'].apply(lambda x,y,z: timedelta(hours=x, minutes=y, seconds=z))
This is what the expected output should look like once everything comes together
谢谢。
在具有 axis=1
的数据帧上使用 apply
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html
triangles = [{ 'base': 20, 'height': 9 }, { 'base': 10, 'height': 7 }, { 'base': 40, 'height': 4 }]
triangles_df = pd.DataFrame(triangles)
def calculate_area(row):
return row['base'] * row['height'] * 0.5
triangles_df.apply(calculate_area, axis=1)
祝你好运!
这可能会有所帮助。
import pandas as pd
import datetime as DT
df = pd.DataFrame({"Hours": [1], "Mins": [2], "Secs": [10]})
df = df.astype(int)
df['Duration'] = df[['Hours', 'Mins', 'Secs']].apply(lambda x: DT.timedelta(hours=x[0], minutes=x[1], seconds=x[2]), axis=1)
print(df)
print(df["Duration"])
输出:
Hours Mins Secs Duration
0 1 2 10 01:02:10
0 01:02:10
dtype: timedelta64[ns]
你应该使用:
df['Duration'] = pd.to_timedelta(df.Hours*3600 + df.Mins*60 + df.Secs, unit='s')
当您在 DataFrame
和 axis=1
上使用 apply 时,它是一个行计算,因此通常这种语法是有意义的:
df['Duration'] = df.apply(lambda row: pd.Timedelta(hours=row.Hours, minutes=row.Mins,
seconds=row.Secs), axis=1)
一些时间
import pandas as pd
import numpy as np
df = pd.DataFrame({'Hours': np.tile([1,2,3,4],50),
'Mins': np.tile([10,20,30,40],50),
'Secs': np.tile([11,21,31,41],50)})
%timeit pd.to_timedelta(df.Hours*3600 + df.Mins*60 + df.Secs, unit='s')
#432 µs ± 5.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit df.apply(lambda row: pd.Timedelta(hours=row.Hours, minutes=row.Mins, seconds=row.Secs), axis=1)
#12 ms ± 67.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
一如既往,申请应该是最后的手段。