在 pandas 数据框中如何应用自己的函数多选列并为该计算创建一个新列
In pandas data frame how to apply own function multiple selective column and create a new column for that calculation
我的数据框(df)由 47 列和 30,000 行组成,列如下
Index(['Unnamed: 0', 'CtpJobId', 'TransformJobStateId', 'LastError',
'PriorityDate', 'QueuedTime', 'AccurateAsOf', 'SentToDevice',
'StartedAtDevice', 'ProcessStart', 'LastProgressAt', 'ProcessEnd',
'OutputFileDuration', 'Tags', 'SegmentId', 'VideoId',
'ClipFirstFrameNumber', 'ClipLastFrameNumber', 'SourceId',
'SourceNamedLocation', 'SourceDirectory', 'SourceFileSize',
'srcMediaFormat', 'srcFrameRate', 'srcWidth', 'srcHeight', 'srcCodec',
'srcDuration', 'TargetId', 'TargetNamedLocation', 'TargetDirectory',
'TargetFilename', 'Description', 'TargetTags', 'tgtFrameRate',
'tgtDropFrame', 'tgtWidth', 'tgtHeight', 'tgtCodec', 'DeviceType',
'DeviceResourceId', 'AssignedDeviceId', 'DeviceName',
'AssignedDeviceJobId', 'DeviceUri'],
dtype='object')
我想为选择列或那个数据框应用一个函数来创建一个名为 df['seg_duration'] 的新列,所以我的函数如下
def seq_duration(df):
if ClipFirstFrameNumber is not None and ClipLastFrameNumber is not None:
fn = ClipLastFrameNumber -ClipFirstFrameNumber
if FrameRate =='23.98' and DropFrame == 'False' :
fps = 24 / 1.001
elif FrameRate == '24' and DropFrame == 'False':
fps = 24
elif FrameRate == '25'and DropFrame == 'False':
fps = 25
elif FrameRate == '29.97':
fps = 30 / 1.001
elif FrameRate == '30' and DropFrame == 'False':
fps = 30
elif FrameRate == '59.94':
fps = 60 / 1.001
Duration = fn/fps
elif srcDuration is not None:
Duration = srcDuration
else:
None
该函数实际上有 3 种情况,在一种情况下有很多条件,所以首先我将 ClipLastFrameNumber 的值减去 ClipFirstframeNumber 列并将其保存到 fn 变量。和其他逻辑一样,srcDuration 是列及其值。比如下面
ClipLastFrameNumber ClipFirstFrameNumber tgtDropFrame tgtFrameRate
NaN NaN True 29.97
NaN NaN True 29.97
NaN NaN True 29.97
34354.0 28892.0 True 29.97
当我如下应用此功能时
df['seg_duration']=df.apply(seq_duration)
我遇到错误 NameError: ("name 'ClipFirstFrameNumber' is not defined", 'occurred at index Unnamed: 0')
是为 pandas 编写函数的正确方法吗?或者我如何将此函数用于该数据框并实现基于该函数创建新列 df['seg_dur'] 的目标.提前致谢
稍微修改一下您的函数:
def seq_duration(row):
Duration = None
if row.ClipFirstFrameNumber is not None and row.ClipLastFrameNumber is not None:
fn = row.ClipLastFrameNumber -row.ClipFirstFrameNumber
fps = 0
if row.FrameRate =='23.98' and row.DropFrame == 'False' :
fps = 24 / 1.001
elif row.FrameRate == '24' and row.DropFrame == 'False':
fps = 24
elif row.FrameRate == '25'and row.DropFrame == 'False':
fps = 25
elif row.FrameRate == '29.97':
fps = 30 / 1.001
elif row.FrameRate == '30' and row.DropFrame == 'False':
fps = 30
elif row.FrameRate == '59.94':
fps = 60 / 1.001
if fps>0:
Duration = fn/fps
elif row.srcDuration is not None:
Duration = row.srcDuration
return Duration
那么你想要:
df['seg_duration']=df.apply(seq_duration,axis = 1)
我的数据框(df)由 47 列和 30,000 行组成,列如下
Index(['Unnamed: 0', 'CtpJobId', 'TransformJobStateId', 'LastError',
'PriorityDate', 'QueuedTime', 'AccurateAsOf', 'SentToDevice',
'StartedAtDevice', 'ProcessStart', 'LastProgressAt', 'ProcessEnd',
'OutputFileDuration', 'Tags', 'SegmentId', 'VideoId',
'ClipFirstFrameNumber', 'ClipLastFrameNumber', 'SourceId',
'SourceNamedLocation', 'SourceDirectory', 'SourceFileSize',
'srcMediaFormat', 'srcFrameRate', 'srcWidth', 'srcHeight', 'srcCodec',
'srcDuration', 'TargetId', 'TargetNamedLocation', 'TargetDirectory',
'TargetFilename', 'Description', 'TargetTags', 'tgtFrameRate',
'tgtDropFrame', 'tgtWidth', 'tgtHeight', 'tgtCodec', 'DeviceType',
'DeviceResourceId', 'AssignedDeviceId', 'DeviceName',
'AssignedDeviceJobId', 'DeviceUri'],
dtype='object')
我想为选择列或那个数据框应用一个函数来创建一个名为 df['seg_duration'] 的新列,所以我的函数如下
def seq_duration(df):
if ClipFirstFrameNumber is not None and ClipLastFrameNumber is not None:
fn = ClipLastFrameNumber -ClipFirstFrameNumber
if FrameRate =='23.98' and DropFrame == 'False' :
fps = 24 / 1.001
elif FrameRate == '24' and DropFrame == 'False':
fps = 24
elif FrameRate == '25'and DropFrame == 'False':
fps = 25
elif FrameRate == '29.97':
fps = 30 / 1.001
elif FrameRate == '30' and DropFrame == 'False':
fps = 30
elif FrameRate == '59.94':
fps = 60 / 1.001
Duration = fn/fps
elif srcDuration is not None:
Duration = srcDuration
else:
None
该函数实际上有 3 种情况,在一种情况下有很多条件,所以首先我将 ClipLastFrameNumber 的值减去 ClipFirstframeNumber 列并将其保存到 fn 变量。和其他逻辑一样,srcDuration 是列及其值。比如下面
ClipLastFrameNumber ClipFirstFrameNumber tgtDropFrame tgtFrameRate
NaN NaN True 29.97
NaN NaN True 29.97
NaN NaN True 29.97
34354.0 28892.0 True 29.97
当我如下应用此功能时
df['seg_duration']=df.apply(seq_duration)
我遇到错误 NameError: ("name 'ClipFirstFrameNumber' is not defined", 'occurred at index Unnamed: 0')
是为 pandas 编写函数的正确方法吗?或者我如何将此函数用于该数据框并实现基于该函数创建新列 df['seg_dur'] 的目标.提前致谢
稍微修改一下您的函数:
def seq_duration(row):
Duration = None
if row.ClipFirstFrameNumber is not None and row.ClipLastFrameNumber is not None:
fn = row.ClipLastFrameNumber -row.ClipFirstFrameNumber
fps = 0
if row.FrameRate =='23.98' and row.DropFrame == 'False' :
fps = 24 / 1.001
elif row.FrameRate == '24' and row.DropFrame == 'False':
fps = 24
elif row.FrameRate == '25'and row.DropFrame == 'False':
fps = 25
elif row.FrameRate == '29.97':
fps = 30 / 1.001
elif row.FrameRate == '30' and row.DropFrame == 'False':
fps = 30
elif row.FrameRate == '59.94':
fps = 60 / 1.001
if fps>0:
Duration = fn/fps
elif row.srcDuration is not None:
Duration = row.srcDuration
return Duration
那么你想要:
df['seg_duration']=df.apply(seq_duration,axis = 1)