每当两列的值匹配时,如何打印时间戳?
How to print the timestamp for whenever there is match between values of two columns?
我有两个数据框,分别称为数据集 1 和数据集 2(如下所示)。 “模式”和“SAX”列包含字符串值。
dataset1=
pattern tstamps
0 glngsyu 1610460
1 zicobgm 1610466
2 eerptow .
3 cqbsynt .
4 zvmqben .
.. ...
475 rfikekw
476 bnbzvqx
477 rsuhgax
478 ckhloio
479 lbzujtw
480 rows × 1 columns
dataset2 =
SAX timestamp
0 hssrlcu 16015
1 ktyuymp 16016
2 xncqmfr 16017
3 aanlmna 16018
4 urvahvo 16019
... ... ...
263455 jeivqzo 279470
263456 bzasxgw 279471
263457 jspqnqv 279472
263458 sxwfchj 279473
263459 gxqnhfr 279474
263460 rows × 2 columns
每当数据集 1 的“模式”列的值与数据集 2 的“SAX”列的值匹配时,是否有 method/function 打印数据集 1 的“tstamps”?
PS:这是一个代码片段,您可以使用它来生成数据集 1 和数据集 2:
import pandas as pd
import numpy as np
def sax_generator(num):
return [''.join(chr(x) for x in np.random.randint(97, 97+26, size=4)) for _ in range(num)]
dataset1 = pd.DataFrame({'pattern': sax_generator(480), 'tstamps': range(480)})
dataset2 = pd.DataFrame({'sax': sax_generator(263460 ), 'timestamp': range(263460 )})
您可以使用Series.isin
import pandas as pd
dataset1 = pd.DataFrame([['value1', 1234], ['value2', 12345], ['value3', 12346],
['value4', 12347], ['value5', 12348], ['value6', 12349]],
columns=['pattern', 'tstamps'])
dataset2 = pd.DataFrame([['value10', 1234], ['value2', 12345], ['value30', 12346],
['value4', 12347], ['value50', 12347], ['value6', 12347], ],
columns=['sax', 'timestamp'])
timestamps = dataset1[dataset1['pattern'].isin(dataset2['sax'])]['tstamps']
print(timestamps)
# Result (type: pandas.Series), do timestamps.tolist() to get python list
1 12345
3 12347
5 12349
Name: tstamps, dtype: int64
我有两个数据框,分别称为数据集 1 和数据集 2(如下所示)。 “模式”和“SAX”列包含字符串值。
dataset1=
pattern tstamps
0 glngsyu 1610460
1 zicobgm 1610466
2 eerptow .
3 cqbsynt .
4 zvmqben .
.. ...
475 rfikekw
476 bnbzvqx
477 rsuhgax
478 ckhloio
479 lbzujtw
480 rows × 1 columns
dataset2 =
SAX timestamp
0 hssrlcu 16015
1 ktyuymp 16016
2 xncqmfr 16017
3 aanlmna 16018
4 urvahvo 16019
... ... ...
263455 jeivqzo 279470
263456 bzasxgw 279471
263457 jspqnqv 279472
263458 sxwfchj 279473
263459 gxqnhfr 279474
263460 rows × 2 columns
每当数据集 1 的“模式”列的值与数据集 2 的“SAX”列的值匹配时,是否有 method/function 打印数据集 1 的“tstamps”?
PS:这是一个代码片段,您可以使用它来生成数据集 1 和数据集 2:
import pandas as pd
import numpy as np
def sax_generator(num):
return [''.join(chr(x) for x in np.random.randint(97, 97+26, size=4)) for _ in range(num)]
dataset1 = pd.DataFrame({'pattern': sax_generator(480), 'tstamps': range(480)})
dataset2 = pd.DataFrame({'sax': sax_generator(263460 ), 'timestamp': range(263460 )})
您可以使用Series.isin
import pandas as pd
dataset1 = pd.DataFrame([['value1', 1234], ['value2', 12345], ['value3', 12346],
['value4', 12347], ['value5', 12348], ['value6', 12349]],
columns=['pattern', 'tstamps'])
dataset2 = pd.DataFrame([['value10', 1234], ['value2', 12345], ['value30', 12346],
['value4', 12347], ['value50', 12347], ['value6', 12347], ],
columns=['sax', 'timestamp'])
timestamps = dataset1[dataset1['pattern'].isin(dataset2['sax'])]['tstamps']
print(timestamps)
# Result (type: pandas.Series), do timestamps.tolist() to get python list
1 12345
3 12347
5 12349
Name: tstamps, dtype: int64