使用 pandas 将数字 ID 列拆分为两个
Split a numeric ID column into two using pandas
DateTime Junction Vehicles ID
0 2015-11-01 00:00:00 1 15 20151101001
1 2015-11-01 01:00:00 1 13 20151101011
2 2015-11-01 02:00:00 1 10 20151101021
3 2015-11-01 03:00:00 1 7 20151101031
4 2015-11-01 04:00:00 1 9 20151101041
5 2015-11-01 05:00:00 1 6 20151101051
6 2015-11-01 06:00:00 1 9 20151101061
7 2015-11-01 07:00:00 1 8 20151101071
8 2015-11-01 08:00:00 1 11 20151101081
9 2015-11-01 09:00:00 1 12 20151101091
我想将 ID
列拆分为两个单独的列,以便前 4 位数字在一列中,其余数字在第二列中。
我试过的代码:
new_ID = data.apply(lambda x: x.rsplit(4))
但是没用。我如何使用 pandas 执行此操作?
df[id_col].map(lambda x: int(str(x)[:5])) # as an integer
df[id_col].map(lambda x: str(x)[:5]) # as a string
选项 1
最简单和最直接的 - 使用 str
访问器。
v = df.ID.astype(str)
df['Year'], df['ID'] = v.str[:4], v.str[4:]
df
DateTime Junction Vehicles ID Year
0 2015-11-01 00:00:00 1 15 1101001 2015
1 2015-11-01 01:00:00 1 13 1101011 2015
2 2015-11-01 02:00:00 1 10 1101021 2015
3 2015-11-01 03:00:00 1 7 1101031 2015
4 2015-11-01 04:00:00 1 9 1101041 2015
5 2015-11-01 05:00:00 1 6 1101051 2015
6 2015-11-01 06:00:00 1 9 1101061 2015
7 2015-11-01 07:00:00 1 8 1101071 2015
8 2015-11-01 08:00:00 1 11 1101081 2015
9 2015-11-01 09:00:00 1 12 1101091 2015
选项 2
str.extract
v = df.ID.astype(str).str.extract('(?P<Year>\d{4})(?P<ID>.*)', expand=True)
df = pd.concat([df.drop('ID', 1), v], 1)
df
DateTime Junction Vehicles Year ID
0 2015-11-01 00:00:00 1 15 2015 1101001
1 2015-11-01 01:00:00 1 13 2015 1101011
2 2015-11-01 02:00:00 1 10 2015 1101021
3 2015-11-01 03:00:00 1 7 2015 1101031
4 2015-11-01 04:00:00 1 9 2015 1101041
5 2015-11-01 05:00:00 1 6 2015 1101051
6 2015-11-01 06:00:00 1 9 2015 1101061
7 2015-11-01 07:00:00 1 8 2015 1101071
8 2015-11-01 08:00:00 1 11 2015 1101081
9 2015-11-01 09:00:00 1 12 2015 1101091
这是一个数值解(假设ID
列的长度是常量):
In [10]: df['Year'], df['ID'] = df['ID'] // 10**7, df['ID'] % 10**7
In [11]: df
Out[11]:
DateTime Junction Vehicles ID Year
0 2015-11-01 00:00:00 1 15 1101001 2015
1 2015-11-01 01:00:00 1 13 1101011 2015
2 2015-11-01 02:00:00 1 10 1101021 2015
3 2015-11-01 03:00:00 1 7 1101031 2015
4 2015-11-01 04:00:00 1 9 1101041 2015
5 2015-11-01 05:00:00 1 6 1101051 2015
6 2015-11-01 06:00:00 1 9 1101061 2015
7 2015-11-01 07:00:00 1 8 1101071 2015
8 2015-11-01 08:00:00 1 11 1101081 2015
9 2015-11-01 09:00:00 1 12 1101091 2015
DateTime Junction Vehicles ID
0 2015-11-01 00:00:00 1 15 20151101001
1 2015-11-01 01:00:00 1 13 20151101011
2 2015-11-01 02:00:00 1 10 20151101021
3 2015-11-01 03:00:00 1 7 20151101031
4 2015-11-01 04:00:00 1 9 20151101041
5 2015-11-01 05:00:00 1 6 20151101051
6 2015-11-01 06:00:00 1 9 20151101061
7 2015-11-01 07:00:00 1 8 20151101071
8 2015-11-01 08:00:00 1 11 20151101081
9 2015-11-01 09:00:00 1 12 20151101091
我想将 ID
列拆分为两个单独的列,以便前 4 位数字在一列中,其余数字在第二列中。
我试过的代码:
new_ID = data.apply(lambda x: x.rsplit(4))
但是没用。我如何使用 pandas 执行此操作?
df[id_col].map(lambda x: int(str(x)[:5])) # as an integer
df[id_col].map(lambda x: str(x)[:5]) # as a string
选项 1
最简单和最直接的 - 使用 str
访问器。
v = df.ID.astype(str)
df['Year'], df['ID'] = v.str[:4], v.str[4:]
df
DateTime Junction Vehicles ID Year
0 2015-11-01 00:00:00 1 15 1101001 2015
1 2015-11-01 01:00:00 1 13 1101011 2015
2 2015-11-01 02:00:00 1 10 1101021 2015
3 2015-11-01 03:00:00 1 7 1101031 2015
4 2015-11-01 04:00:00 1 9 1101041 2015
5 2015-11-01 05:00:00 1 6 1101051 2015
6 2015-11-01 06:00:00 1 9 1101061 2015
7 2015-11-01 07:00:00 1 8 1101071 2015
8 2015-11-01 08:00:00 1 11 1101081 2015
9 2015-11-01 09:00:00 1 12 1101091 2015
选项 2
str.extract
v = df.ID.astype(str).str.extract('(?P<Year>\d{4})(?P<ID>.*)', expand=True)
df = pd.concat([df.drop('ID', 1), v], 1)
df
DateTime Junction Vehicles Year ID
0 2015-11-01 00:00:00 1 15 2015 1101001
1 2015-11-01 01:00:00 1 13 2015 1101011
2 2015-11-01 02:00:00 1 10 2015 1101021
3 2015-11-01 03:00:00 1 7 2015 1101031
4 2015-11-01 04:00:00 1 9 2015 1101041
5 2015-11-01 05:00:00 1 6 2015 1101051
6 2015-11-01 06:00:00 1 9 2015 1101061
7 2015-11-01 07:00:00 1 8 2015 1101071
8 2015-11-01 08:00:00 1 11 2015 1101081
9 2015-11-01 09:00:00 1 12 2015 1101091
这是一个数值解(假设ID
列的长度是常量):
In [10]: df['Year'], df['ID'] = df['ID'] // 10**7, df['ID'] % 10**7
In [11]: df
Out[11]:
DateTime Junction Vehicles ID Year
0 2015-11-01 00:00:00 1 15 1101001 2015
1 2015-11-01 01:00:00 1 13 1101011 2015
2 2015-11-01 02:00:00 1 10 1101021 2015
3 2015-11-01 03:00:00 1 7 1101031 2015
4 2015-11-01 04:00:00 1 9 1101041 2015
5 2015-11-01 05:00:00 1 6 1101051 2015
6 2015-11-01 06:00:00 1 9 1101061 2015
7 2015-11-01 07:00:00 1 8 1101071 2015
8 2015-11-01 08:00:00 1 11 1101081 2015
9 2015-11-01 09:00:00 1 12 1101091 2015