pandas DataFrame:按间隔匹配数据框和字典
pandas DataFrame: match a dataframe and a dict by intervals
我有一个关于 DataFrame 的问题。我有一个数据框,间隔为 0.1 秒,特征属于该间隔。我想添加一个列,其中包含来自先前算法的预测(此间隔是静音还是声音)。我有一本字典,其中包含每个录音的所有预测静音间隔。我的数据框将如下所示。这里 df 在 audio_id==0 上过滤并在 interval_x.
上排序
audio_id interval_x interval_y predicted_value
0 0 0.579367 0.679367 0
1 0 0.679367 0.779367 0
2 0 0.779367 0.879367 0
3 0 0.879367 0.979367 0
4 0 0.979367 1.079367 0
... ... ... ... ...
518 0 50.805830 50.905830 0
519 0 50.905830 51.005830 0
520 0 51.005830 51.105830 0
521 0 51.105830 51.205830 0
522 0 51.205830 51.212938 0
我的包含静音间隔的字典如下所示:
{'0': [[1.4501383219954658, 2.058138321995466],
[3.298138321995466, 4.762138321995465],
[7.682138321995467, 8.266138321995465],
[11.266138321995466, 11.938138321995465],
[13.242138321995466, 13.706138321995466],
[16.73013832199547, 17.82613832199547],
[24.53813832199547, 25.130138321995467],
[26.394138321995467, 27.042138321995466],
[28.21013832199547, 28.722138321995466]],
'1': [[0.0, 0.31253968253968023],
[4.296539682539681, 5.040539682539681],
[8.64053968253968, 9.296539682539679],
等每个音频文件。
执行此操作的有效方法是什么?
这是一个解决方案,使用 merge_asof
将间隔与最接近的静音时间相匹配。 d
是问题中的字典,intervals
是数据框。
silent_times = pd.DataFrame.from_records([(file, from_time, to_time) for file, values in d.items()
for [from_time, to_time] in values],
columns = ["audio_id", "from_time", "to_time"])
silent_times.audio_id = silent_times.audio_id.astype(int)
res = pd.DataFrame()
for inx in intervals.audio_id.unique():
intervals_slice = intervals[intervals.audio_id == inx]
silent_times_slice = silent_times[silent_times.audio_id == inx]
t = pd.merge_asof(intervals_slice, silent_times_slice, left_on=["interval_x"], right_on=["from_time"])
t.loc[(t.interval_x>=t.from_time) & (t.interval_y <=t.to_time), "predicted_value"] = 1
res = res.append(t)
问题中数据帧的结果,以及此静默间隔:
d = {'0': [
[1.4501383219954658, 2.058138321995466],
[3.298138321995466, 4.762138321995465],
[7.682138321995467, 8.266138321995465],
[50.01, 51.01]
],
'1': [
[0.0, 0.31253968253968023],
[4.296539682539681, 5.040539682539681],
[8.64053968253968, 9.296539682539679]]}
如下:
print(res[["audio_id_x", "interval_x", "interval_y", "predicted_value"]])
audio_id_x interval_x interval_y predicted_value
0 0 0.579367 0.679367 0
1 0 0.679367 0.779367 0
2 0 0.779367 0.879367 0
3 0 0.879367 0.979367 0
4 0 0.979367 1.079367 0
5 0 50.805830 50.905830 1
6 0 50.905830 51.005830 1
7 0 51.005830 51.105830 0
8 0 51.105830 51.205830 0
9 0 51.205830 51.212938 0
我有一个关于 DataFrame 的问题。我有一个数据框,间隔为 0.1 秒,特征属于该间隔。我想添加一个列,其中包含来自先前算法的预测(此间隔是静音还是声音)。我有一本字典,其中包含每个录音的所有预测静音间隔。我的数据框将如下所示。这里 df 在 audio_id==0 上过滤并在 interval_x.
上排序 audio_id interval_x interval_y predicted_value
0 0 0.579367 0.679367 0
1 0 0.679367 0.779367 0
2 0 0.779367 0.879367 0
3 0 0.879367 0.979367 0
4 0 0.979367 1.079367 0
... ... ... ... ...
518 0 50.805830 50.905830 0
519 0 50.905830 51.005830 0
520 0 51.005830 51.105830 0
521 0 51.105830 51.205830 0
522 0 51.205830 51.212938 0
我的包含静音间隔的字典如下所示:
{'0': [[1.4501383219954658, 2.058138321995466],
[3.298138321995466, 4.762138321995465],
[7.682138321995467, 8.266138321995465],
[11.266138321995466, 11.938138321995465],
[13.242138321995466, 13.706138321995466],
[16.73013832199547, 17.82613832199547],
[24.53813832199547, 25.130138321995467],
[26.394138321995467, 27.042138321995466],
[28.21013832199547, 28.722138321995466]],
'1': [[0.0, 0.31253968253968023],
[4.296539682539681, 5.040539682539681],
[8.64053968253968, 9.296539682539679],
等每个音频文件。
执行此操作的有效方法是什么?
这是一个解决方案,使用 merge_asof
将间隔与最接近的静音时间相匹配。 d
是问题中的字典,intervals
是数据框。
silent_times = pd.DataFrame.from_records([(file, from_time, to_time) for file, values in d.items()
for [from_time, to_time] in values],
columns = ["audio_id", "from_time", "to_time"])
silent_times.audio_id = silent_times.audio_id.astype(int)
res = pd.DataFrame()
for inx in intervals.audio_id.unique():
intervals_slice = intervals[intervals.audio_id == inx]
silent_times_slice = silent_times[silent_times.audio_id == inx]
t = pd.merge_asof(intervals_slice, silent_times_slice, left_on=["interval_x"], right_on=["from_time"])
t.loc[(t.interval_x>=t.from_time) & (t.interval_y <=t.to_time), "predicted_value"] = 1
res = res.append(t)
问题中数据帧的结果,以及此静默间隔:
d = {'0': [
[1.4501383219954658, 2.058138321995466],
[3.298138321995466, 4.762138321995465],
[7.682138321995467, 8.266138321995465],
[50.01, 51.01]
],
'1': [
[0.0, 0.31253968253968023],
[4.296539682539681, 5.040539682539681],
[8.64053968253968, 9.296539682539679]]}
如下:
print(res[["audio_id_x", "interval_x", "interval_y", "predicted_value"]])
audio_id_x interval_x interval_y predicted_value
0 0 0.579367 0.679367 0
1 0 0.679367 0.779367 0
2 0 0.779367 0.879367 0
3 0 0.879367 0.979367 0
4 0 0.979367 1.079367 0
5 0 50.805830 50.905830 1
6 0 50.905830 51.005830 1
7 0 51.005830 51.105830 0
8 0 51.105830 51.205830 0
9 0 51.205830 51.212938 0