根据列表内容重复 pandas 行

Question

我有一个很大的 pandas 数据帧 df 为：

Col1    Col2
2       4
3       5

我有一个很大的清单：

['2020-08-01', '2021-09-01', '2021-11-01']

我正在努力实现以下目标：

Col1    Col2    StartDate
2       4       8/1/2020
3       5       8/1/2020
2       4       9/1/2021
3       5       9/1/2021
2       4       11/1/2021
3       5       11/1/2021

基本上平铺数据框 df，同时将列表的元素添加为新列。我不确定如何处理这个问题？

Answer 1

您可以尝试组合使用 np.tile 和 np.repeat:

df.loc[np.tile(df.index,len(lst))].assign(StartDate=np.repeat(lst,len(df)))

输出：

   Col1  Col2   StartDate
0     2     4  2020-08-01
1     3     5  2020-08-01
0     2     4  2021-09-01
1     3     5  2021-09-01
0     2     4  2021-11-01
1     3     5  2021-11-01

Answer 2

您还可以在从列表创建 df 后使用 merge 进行交叉连接：

l = ['2020-08-01', '2021-09-01', '2021-11-01']

(df.assign(k=1).merge(pd.DataFrame({'StartDate':l, 'k':1}),on='k')
   .sort_values('StartDate').drop("k",1))

   Col1  Col2   StartDate
0     2     4  2020-08-01
3     3     5  2020-08-01
1     2     4  2021-09-01
4     3     5  2021-09-01
2     2     4  2021-11-01
5     3     5  2021-11-01

Answer 3

让列表理解与 assign 和 pd.concat:

l = ['2020-08-01', '2021-09-01', '2021-11-01']
pd.concat([df1.assign(startDate=i) for i in l], ignore_index=True)

输出：

   Col1  Col2   startDate
0     2     4  2020-08-01
1     3     5  2020-08-01
2     2     4  2021-09-01
3     3     5  2021-09-01
4     2     4  2021-11-01
5     3     5  2021-11-01

Answer 4

我会使用 concat:

df = pd.DataFrame({'col1': [2,3], 'col2': [4, 5]})
dict_dfs = {k: df for k in ['2020-08-01', '2021-09-01', '2021-11-01']}
pd.concat(dict_dfs)

然后您可以重命名并清理索引。

              col1  col2
2020-08-01 0     2     4
           1     3     5
2021-09-01 0     2     4
           1     3     5
2021-11-01 0     2     4
           1     3     5

Answer 5

我可能会 itertools，请注意可以根据第 1 列

使用 sort_values 降低顺序

import itertools
df=pd.DataFrame([*itertools.product(df.index,l)]).set_index(0).join(df)
            1  Col1  Col2
0  2020-08-01     2     4
0  2021-09-01     2     4
0  2021-11-01     2     4
1  2020-08-01     3     5
1  2021-09-01     3     5
1  2021-11-01     3     5

根据列表内容重复 pandas 行

Repeat pandas rows based on content of a list

pandas

python-3.8