根据正则表达式连接 pandas 中的连续行
Concat successive rows in pandas based on regex
我有一个包含 date
的以下数据框是扭曲的方式。
index Date Particulars
0 01-12- AVON AGRO
1 2018 NaN
2 01-12- CASH
3 2018 NaN
4 03-12- NEFTOut/UTBIN18337459966/LUNI
5 2018 A MARKETING/SBIN00019
6 03-12- ANJANI TRADERS
7 2018 NaN
8 03-12- NEFTOut/UTBIN18337484160/BIGS
9 2018 MILE PRODUCTS/UTIB000
但我想要如下输出:
index Date Particulars
0 01-12-2018 AVON AGRO
2 01-12-2018 CASH
4 03-12-2018 NEFTOut/UTBIN18337459966/LUNIA MARKETING/SBIN00019
6 03-12-2018 ANJANI TRADERS
8 03-12-2018 NEFTOut/UTBIN18337484160/BIGSMILE PRODUCTS/UTIB000
我试过 df.apply(lambda x: x if re.search('\d{4}$', str(x)) else str(x.shift(-1)) + str(x))
但它给了我:
Date 0 2018\n1 01-12-\n2 2018...
Particulars 0 NaN\n1 ...
dtype: object
首先将缺失值替换为空字符串,然后通过 groupby
与 join
:
成对连接并配对行
df1 = df.fillna('').groupby(df.index // 2).agg(''.join)
print (df1)
Date Particulars
index
0 01-12-2018 AVON AGRO
1 01-12-2018 CASH
2 03-12-2018 NEFTOut/UTBIN18337459966/LUNIA MARKETING/SBIN0...
3 03-12-2018 ANJANI TRADERS
4 03-12-2018 NEFTOut/UTBIN18337484160/BIGSMILE PRODUCTS/UTI...
或select按位置配对和取消配对:
df1 = df.fillna('')
df1 = df1.iloc[::2].reset_index(drop=True) + df1.iloc[1::2].reset_index(drop=True)
print (df1)
Date Particulars
0 01-12-2018 AVON AGRO
1 01-12-2018 CASH
2 03-12-2018 NEFTOut/UTBIN18337459966/LUNIA MARKETING/SBIN0...
3 03-12-2018 ANJANI TRADERS
4 03-12-2018 NEFTOut/UTBIN18337484160/BIGSMILE PRODUCTS/UTI...
也可以用正则表达式解决:
df1 = df.fillna('')
m = df1['Date'].str.contains('\d{4}$')
df1 = df1[m.shift(-1).fillna(False)].reset_index(drop=True) + df1[m].reset_index(drop=True)
我有一个包含 date
的以下数据框是扭曲的方式。
index Date Particulars
0 01-12- AVON AGRO
1 2018 NaN
2 01-12- CASH
3 2018 NaN
4 03-12- NEFTOut/UTBIN18337459966/LUNI
5 2018 A MARKETING/SBIN00019
6 03-12- ANJANI TRADERS
7 2018 NaN
8 03-12- NEFTOut/UTBIN18337484160/BIGS
9 2018 MILE PRODUCTS/UTIB000
但我想要如下输出:
index Date Particulars
0 01-12-2018 AVON AGRO
2 01-12-2018 CASH
4 03-12-2018 NEFTOut/UTBIN18337459966/LUNIA MARKETING/SBIN00019
6 03-12-2018 ANJANI TRADERS
8 03-12-2018 NEFTOut/UTBIN18337484160/BIGSMILE PRODUCTS/UTIB000
我试过 df.apply(lambda x: x if re.search('\d{4}$', str(x)) else str(x.shift(-1)) + str(x))
但它给了我:
Date 0 2018\n1 01-12-\n2 2018...
Particulars 0 NaN\n1 ...
dtype: object
首先将缺失值替换为空字符串,然后通过 groupby
与 join
:
df1 = df.fillna('').groupby(df.index // 2).agg(''.join)
print (df1)
Date Particulars
index
0 01-12-2018 AVON AGRO
1 01-12-2018 CASH
2 03-12-2018 NEFTOut/UTBIN18337459966/LUNIA MARKETING/SBIN0...
3 03-12-2018 ANJANI TRADERS
4 03-12-2018 NEFTOut/UTBIN18337484160/BIGSMILE PRODUCTS/UTI...
或select按位置配对和取消配对:
df1 = df.fillna('')
df1 = df1.iloc[::2].reset_index(drop=True) + df1.iloc[1::2].reset_index(drop=True)
print (df1)
Date Particulars
0 01-12-2018 AVON AGRO
1 01-12-2018 CASH
2 03-12-2018 NEFTOut/UTBIN18337459966/LUNIA MARKETING/SBIN0...
3 03-12-2018 ANJANI TRADERS
4 03-12-2018 NEFTOut/UTBIN18337484160/BIGSMILE PRODUCTS/UTI...
也可以用正则表达式解决:
df1 = df.fillna('')
m = df1['Date'].str.contains('\d{4}$')
df1 = df1[m.shift(-1).fillna(False)].reset_index(drop=True) + df1[m].reset_index(drop=True)