如何添加在不包括 NaT 的文本行组上递增的索引
How to add an index that increments on groups of rows of text not including NaT
我有一个数据框,其中有一列代码具有连续的文本行,后跟连续的空值行 (NaN)。
codes
FKW
FCJ
XQ8
1L9
NaN
NaN
PNU
LIT
NaN
422
一组字母数字代码和缺失值 (NaN) 形成一个 cycle.I 想要添加一列循环索引 (index),该列在下一个循环开始时递增。当缺失值 (NaN) 后跟一个代码(字母数字值)时,下一个循环开始。
code index
FKW 1
FCJ 1
XQ8 1
1L9 1
NaN 1
NaN 1
PNU 2 next group starts here
LIT 2
NaN 2
422 3 next group starts here
这是生成上述示例的代码
def id_generator(size=3, chars=string.ascii_uppercase + string.digits):
return ''.join(random.choice(chars) for _ in range(size))
num_rows = 10
data = np.array([id_generator() for i in range(num_rows)])
df = pd.DataFrame(data, columns=['code'])
df.code[4,5,8]=NaN
print('what i have')
print(df)
print('what I want')
df['index']=[1,1,1,1,1,1,2,2,2,3]
print(df)
如何生成索引列?
我能想到的最简单的方法是迭代数据框的内容并跟踪最后一个值是否为 NaN。
index = []
index_counter = 1
last_was_NaN = False
for row in df.itertuples():
if type(row[1]) is float and np.isnan(row[1]): # check if second column (first after pandas indices) is NaN
last_was_NaN = True
elif last_was_NaN: # if we have text now, we can store that and increase the counter
last_was_NaN = False
index_counter += 1
index.append(index_counter) # don't forget to add the calculated index
df['index'] = index
试试这个:
s = df.codes.notna()
df['index'] = (s & ~(s.shift(fill_value=False))).cumsum()
Out[718]:
codes index
0 FKW 1
1 FCJ 1
2 XQ8 1
3 1L9 1
4 NaN 1
5 NaN 1
6 PNU 2
7 LIT 2
8 NaN 2
9 422 3
我有一个数据框,其中有一列代码具有连续的文本行,后跟连续的空值行 (NaN)。
codes
FKW
FCJ
XQ8
1L9
NaN
NaN
PNU
LIT
NaN
422
一组字母数字代码和缺失值 (NaN) 形成一个 cycle.I 想要添加一列循环索引 (index),该列在下一个循环开始时递增。当缺失值 (NaN) 后跟一个代码(字母数字值)时,下一个循环开始。
code index
FKW 1
FCJ 1
XQ8 1
1L9 1
NaN 1
NaN 1
PNU 2 next group starts here
LIT 2
NaN 2
422 3 next group starts here
这是生成上述示例的代码
def id_generator(size=3, chars=string.ascii_uppercase + string.digits):
return ''.join(random.choice(chars) for _ in range(size))
num_rows = 10
data = np.array([id_generator() for i in range(num_rows)])
df = pd.DataFrame(data, columns=['code'])
df.code[4,5,8]=NaN
print('what i have')
print(df)
print('what I want')
df['index']=[1,1,1,1,1,1,2,2,2,3]
print(df)
如何生成索引列?
我能想到的最简单的方法是迭代数据框的内容并跟踪最后一个值是否为 NaN。
index = []
index_counter = 1
last_was_NaN = False
for row in df.itertuples():
if type(row[1]) is float and np.isnan(row[1]): # check if second column (first after pandas indices) is NaN
last_was_NaN = True
elif last_was_NaN: # if we have text now, we can store that and increase the counter
last_was_NaN = False
index_counter += 1
index.append(index_counter) # don't forget to add the calculated index
df['index'] = index
试试这个:
s = df.codes.notna()
df['index'] = (s & ~(s.shift(fill_value=False))).cumsum()
Out[718]:
codes index
0 FKW 1
1 FCJ 1
2 XQ8 1
3 1L9 1
4 NaN 1
5 NaN 1
6 PNU 2
7 LIT 2
8 NaN 2
9 422 3