替换数据框中一列的值

Replace values from one column in dataframe

导入 pandas 作为 pd 将 numpy 导入为 np 导入 ast

pd.options.display.max_columns = 20

我有这样的数据框列季节(前 20 个条目):

      season
0     2006-07
1     2007-08
2     2008-09
3     2009-10
4     2010-11
5     2011-12
6     2012-13
7     2013-14
8     2014-15
9     2015-16
10    2016-17
11    2017-18
12    2018-19
13     Career
14     season
15    2018-19
16     Career
17     season
18    2017-18
19    2018-19

它以赛季开始,以职业生涯结束。我想用从 1 开始到有职业生涯结束的数字替换年份。我想变成这样:

      season
0     1
1     2
2     3
3     4
4     5
5     6
6     7
7     8
8     9
9     10
10    11
11    12
12    13
13     Career
14     season
15    1
16     Career
17     season
18    1
19    2

因此,每次列中出现赛季时,计数都应重置,每次出现职业生涯时,计数都会结束。

通过比较 Series.isin with shifted values with GroupBy.cumcount 为计数器创建的掩码创建连续组:

s = df['season'].isin(['Career', 'season'])
df['new'] = np.where(s, df['season'], df.groupby(s.ne(s.shift()).cumsum()).cumcount() + 1)
print (df)
     season     new
0   2006-07       1
1   2007-08       2
2   2008-09       3
3   2009-10       4
4   2010-11       5
5   2011-12       6
6   2012-13       7
7   2013-14       8
8   2014-15       9
9   2015-16      10
10  2016-17      11
11  2017-18      12
12  2018-19      13
13   Career  Career
14   season  season
15  2018-19       1
16   Career  Career
17   season  season
18  2017-18       1
19  2018-19       2

替换列 season:

s = df['season'].isin(['Career', 'season'])
df.loc[~s, 'season'] = df.groupby(s.ne(s.shift()).cumsum()).cumcount() + 1
print (df)
    season
0        1
1        2
2        3
3        4
4        5
5        6
6        7
7        8
8        9
9       10
10      11
11      12
12      13
13  Career
14  season
15       1
16  Career
17  season
18       1
19       2