Pandas 替换列中的值,但 to_replace 参数是一个包含元组的元组
Pandas replacing values in columns, but to_replace argument is a tuple containing tuples
我正在解码 NLSY 79 的值。它们是职业性行业。每个行业都有一些职业;例如:从 17 到 29 的所有职业都属于农业、林业和渔业。我尝试了三种策略,但有两个 return 错误,第三个没有将值存储在数据框中。
执行代码如下(受访者最多列出5个职位,所有职位都包含在数据中)
df[['Job1', 'Job2', 'Job3', 'Job4', 'Job5']].replace(to_replace=jobs['code'], value=jobs['true'], inplace=True)
策略 1
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
jobs = {'code': ( tuple(range(17,29)), ... )
'true': ( 'Agriculture, Forestry & Fisheries', ... )
策略 2
TypeError: Cannot compare types 'ndarray(dtype=float64)' and 'range'
jobs = {'code': ( range(17,29), ... )
'true': ( 'Agriculture, Forestry & Fisheries', ... )
策略 3
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
jobs = {'code': ( any(tuple(range(17, 29))), any(tuple(range(47, 58))), ... )
'true': ( 'Agriculture, Forestry & Fisheries', 'Mining', ... )
我认为对第三个 strategy/execution 代码进行调整是最好的,但我对编码仍然是新手,不确定它会是什么。关于如何解决这个问题有什么建议吗?
Input:
Job1 ...
0 339 ...
1 757 ...
2 739 ...
3 448 ...
Desired Output:
Job1 ...
0 Utilities ...
1 Professional ...
2 Professional ...
3 Retail ...
job = {'code': (list(range(17, 29)),
list(range(47, 58)),
list(range(67, 78)), ...)
'true': ('Agriculture, Forestry & Fisheries',
'Mining',
'Construction', ...)}
解决了。不是最快的方法,但它有效。
job = {'code': (list(range(17, 29)), ...),
'true': ('Agriculture, Forestry & Fisheries', ...)}
for i, x in enumerate(job['code']):
for key in df_jobs:
df[key].replace(to_replace=x, value=[job['true'][i]]*len(x), inplace=True)
试试这个:
df1
Job1
0 20
1 50
2 70
job = {'code': (list(range(17, 29)),
list(range(47, 58)),
list(range(67, 78))),
'true': ('Agriculture, Forestry & Fisheries',
'Mining',
'Construction')}
pd_replace = pd.DataFrame(job).explode('code')
df1.replace(dict(zip(pd_replace['code'], pd_replace['true'])))
Job1
0 Agriculture, Forestry & Fisheries
1 Mining
2 Construction
我正在解码 NLSY 79 的值。它们是职业性行业。每个行业都有一些职业;例如:从 17 到 29 的所有职业都属于农业、林业和渔业。我尝试了三种策略,但有两个 return 错误,第三个没有将值存储在数据框中。
执行代码如下(受访者最多列出5个职位,所有职位都包含在数据中)
df[['Job1', 'Job2', 'Job3', 'Job4', 'Job5']].replace(to_replace=jobs['code'], value=jobs['true'], inplace=True)
策略 1
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
jobs = {'code': ( tuple(range(17,29)), ... )
'true': ( 'Agriculture, Forestry & Fisheries', ... )
策略 2
TypeError: Cannot compare types 'ndarray(dtype=float64)' and 'range'
jobs = {'code': ( range(17,29), ... )
'true': ( 'Agriculture, Forestry & Fisheries', ... )
策略 3
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
jobs = {'code': ( any(tuple(range(17, 29))), any(tuple(range(47, 58))), ... )
'true': ( 'Agriculture, Forestry & Fisheries', 'Mining', ... )
我认为对第三个 strategy/execution 代码进行调整是最好的,但我对编码仍然是新手,不确定它会是什么。关于如何解决这个问题有什么建议吗?
Input:
Job1 ...
0 339 ...
1 757 ...
2 739 ...
3 448 ...
Desired Output:
Job1 ...
0 Utilities ...
1 Professional ...
2 Professional ...
3 Retail ...
job = {'code': (list(range(17, 29)),
list(range(47, 58)),
list(range(67, 78)), ...)
'true': ('Agriculture, Forestry & Fisheries',
'Mining',
'Construction', ...)}
解决了。不是最快的方法,但它有效。
job = {'code': (list(range(17, 29)), ...),
'true': ('Agriculture, Forestry & Fisheries', ...)}
for i, x in enumerate(job['code']):
for key in df_jobs:
df[key].replace(to_replace=x, value=[job['true'][i]]*len(x), inplace=True)
试试这个:
df1
Job1
0 20
1 50
2 70
job = {'code': (list(range(17, 29)),
list(range(47, 58)),
list(range(67, 78))),
'true': ('Agriculture, Forestry & Fisheries',
'Mining',
'Construction')}
pd_replace = pd.DataFrame(job).explode('code')
df1.replace(dict(zip(pd_replace['code'], pd_replace['true'])))
Job1
0 Agriculture, Forestry & Fisheries
1 Mining
2 Construction