根据不起作用的字典替换数据框列中的值
Replace values in columns of dataframe based on dictionary not working
您可以在下面阅读确切的问题,但这实际上是我正在尝试做的事情:
df1 = pd.DataFrame({'A':['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})
newVals = dict({'A0': 0,
'A1': 1,
'A2': 2,
'A3': 3})
for key, value in newVals.items():
df1['A'].replace({key, value})
当我这样做时,生成的数据框没有变化。
初始Post:
好的,我正在分析来自 OSHA (osha_accident_injury.csv) 的工作场所事故数据。每一行都是在事故中受伤的特定人员。每列都是人或事故本身的特征。并且每个特征都被编码为具有相应字符串值的整数。我想用它的字符串定义替换每个整数。 osha_accident_lookup.csv 中列出了数字到字符串的映射。事故代码的映射可以在osha_accident_dictionary.csv中找到,但我手动将它们输入到映射中。
然而,一些整数映射到多个字符串,所以它也依赖于 osha_accident_lookup.csv 中的 accident_code。因此,我创建了一个列表,其中包含每个特定事故代码的字典(将整数映射到字符串值)。但是,当我尝试用其特定字典替换每一列时,它 returns 原始数据框而不是具有字符串值的数据框。谁能看出我做错了什么?
# create list of all distinct accident codes
code_list = []
for index in osha_accident_lookup.index:
if osha_accident_lookup['accident_code'][index] not in code_list:
code_list.append(osha_accident_lookup['accident_code'][index])
# remove values not found in actual data
code_list.remove('PTYP')
code_list.remove('COST')
code_list.remove('ENDU')
# create list of dictionaries, s.t. each item maps accident number to accident value
# there is a unique map for each unique accident code
mapList = []
for code in code_list:
temp_df = pd.DataFrame(osha_accident_lookup[osha_accident_lookup['accident_code'] == code])
temp_map = dict(zip(temp_df['accident_number'], temp_df['accident_value']))
mapList.append(temp_map)
# create dictionary that maps code from osha_accident_lookup to column name in osha_accident_injury.csv
code_to_column = dict({"OCC": "occ_code", 'CAUS': 'fat_cause', 'DEGR': 'degree_of_inj',
"OPER": "const_op_cause", "EN": 'evn_factor', "FT": 'event_type', "HU": 'hum_factor', "IN":
"nature_of_inj", "BD": "part_of_body", "SO": "src_of_injury", "TASK": 'task_assigned'})
# replace numbers in injury data with string values of what the #'s represent
iterator = 0
for item in mapList:
code = code_list[iterator]
col_name = code_to_column[code]
for key, value in item.items():
osha_accident_injury[col_name].replace({key: value})
iterator += 1
osha_accident_injury.csv(前 10 行):
FIELD1
summary_nr
rel_insp_nr
age
sex
nature_of_inj
part_of_body
src_of_injury
event_type
evn_factor
hum_factor
occ_code
degree_of_inj
task_assigned
hazsub
const_op
const_op_cause
fat_cause
fall_distance
fall_ht
injury_line_nr
load_dt
0
18
10006732
0
10.0
12.0
15.0
13.0
18.0
1.0
0.0
1.0
1.0
0.0
0.0
0.0
1
2017-03-20 01:00:11 EDT
1
26
159996
0
21.0
19.0
42.0
5.0
13.0
9.0
0.0
1.0
1.0
0.0
0.0
0.0
1
2017-03-20 01:00:11 EDT
2
34
10013225
0
21.0
4.0
19.0
8.0
18.0
1.0
0.0
1.0
1.0
0270
0.0
0.0
0.0
1
2017-03-20 01:00:11 EDT
3
42
10014439
0
1.0
10.0
24.0
2.0
3.0
1.0
0.0
2.0
2.0
0.0
0.0
0.0
1
2017-03-20 01:00:11 EDT
4
59
19523588
0
5.0
4.0
16.0
10.0
9.0
1.0
0.0
2.0
1.0
0.0
0.0
0.0
1
2017-03-20 01:00:11 EDT
5
59
19523588
0
21.0
5.0
16.0
8.0
9.0
14.0
0.0
2.0
2.0
0.0
0.0
0.0
2
2017-03-20 01:00:11 EDT
6
59
19523588
0
21.0
5.0
16.0
6.0
9.0
14.0
0.0
2.0
2.0
0.0
0.0
0.0
3
2017-03-20 01:00:11 EDT
7
59
19523588
0
21.0
5.0
16.0
8.0
9.0
14.0
0.0
2.0
2.0
0.0
0.0
0.0
4
2017-03-20 01:00:11 EDT
8
59
19523588
0
21.0
5.0
16.0
8.0
9.0
14.0
0.0
2.0
2.0
0.0
0.0
0.0
5
2017-03-20 01:00:11 EDT
9
59
19523588
0
21.0
5.0
16.0
8.0
9.0
14.0
0.0
2.0
2.0
0.0
0.0
0.0
6
2017-03-20 01:00:11 EDT
osha_accident_lookup.csv(前 10 行):
accident_code
accident_number
accident_value
accident_letter
load_date
OPER
1
Backfilling and compacting
2018-11-09 20:56:02 EST
OPER
2
Bituminous concrete placement
2018-11-09 20:56:02 EST
OPER
3
Construction of playing fields, tennis courts
2018-11-09 20:56:02 EST
SO
1
AIRCRAFT
2018-11-09 20:56:02 EST
SO
2
AIR PRESSURE
2018-11-09 20:56:02 EST
SO
3
ANIMAL/INS/REPT/ETC.
2018-11-09 20:56:02 EST
OCC
757
Separating, filtering & clarifying mach. operators
2018-11-09 20:56:02 EST
OCC
758
Compressing and compacting machine operators
2018-11-09 20:56:02 EST
OCC
759
Painting and paint spraying machine operators
2018-11-09 20:56:02 EST
OCC
763
Roasting and baking machine operators, food
2018-11-09 20:56:02 EST
osha_data_dictionary.csv(前 10 行):
table_name
column_name
attribute_name
definition
column_datatype
display_name
osha_accident
nonbuild_ht
Non Building Height
Construction - height in feet when not a building
Numeric, Length=4
Height for Non-Building
osha_accident
project_type
Project Type
Construction - project type (code table PTYP)
Alphanumeric, Length:1
Project Type
osha_accident
event_date
Event Date
Date of accident (yyyymmdd)
Numeric, Length=8
Event Date
osha_accident
event_keyword
Event Keyword
Contains comma separated keywords entered by ERG during the review process.
Alphanumeric, Length:200
Event Keyword
osha_accident
report_id
Report ID
Identifies the OSHA federal or state reporting jurisdiction
Numeric, Length=7
Reporting ID
osha_accident
event_desc
Event Description
Short description of event
Alphanumeric, Length:60
Event Description
osha_accident
load_dt
Load Date Timestamp
The date the load was completed.
date
No Label
osha_accident
summary_nr
Summary NR
Identifies the accident OSHA-170 form
Numeric, Length=9
Summary NR
osha_accident
fatality
Fatality
X=Fatality is associated with accident
Alphanumeric, Length:1
Fatality
根据您的示例尝试此方法。
df1['A'] = df1['A'].map(newVals)
您可以在下面阅读确切的问题,但这实际上是我正在尝试做的事情:
df1 = pd.DataFrame({'A':['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})
newVals = dict({'A0': 0,
'A1': 1,
'A2': 2,
'A3': 3})
for key, value in newVals.items():
df1['A'].replace({key, value})
当我这样做时,生成的数据框没有变化。
初始Post:
好的,我正在分析来自 OSHA (osha_accident_injury.csv) 的工作场所事故数据。每一行都是在事故中受伤的特定人员。每列都是人或事故本身的特征。并且每个特征都被编码为具有相应字符串值的整数。我想用它的字符串定义替换每个整数。 osha_accident_lookup.csv 中列出了数字到字符串的映射。事故代码的映射可以在osha_accident_dictionary.csv中找到,但我手动将它们输入到映射中。
然而,一些整数映射到多个字符串,所以它也依赖于 osha_accident_lookup.csv 中的 accident_code。因此,我创建了一个列表,其中包含每个特定事故代码的字典(将整数映射到字符串值)。但是,当我尝试用其特定字典替换每一列时,它 returns 原始数据框而不是具有字符串值的数据框。谁能看出我做错了什么?
# create list of all distinct accident codes
code_list = []
for index in osha_accident_lookup.index:
if osha_accident_lookup['accident_code'][index] not in code_list:
code_list.append(osha_accident_lookup['accident_code'][index])
# remove values not found in actual data
code_list.remove('PTYP')
code_list.remove('COST')
code_list.remove('ENDU')
# create list of dictionaries, s.t. each item maps accident number to accident value
# there is a unique map for each unique accident code
mapList = []
for code in code_list:
temp_df = pd.DataFrame(osha_accident_lookup[osha_accident_lookup['accident_code'] == code])
temp_map = dict(zip(temp_df['accident_number'], temp_df['accident_value']))
mapList.append(temp_map)
# create dictionary that maps code from osha_accident_lookup to column name in osha_accident_injury.csv
code_to_column = dict({"OCC": "occ_code", 'CAUS': 'fat_cause', 'DEGR': 'degree_of_inj',
"OPER": "const_op_cause", "EN": 'evn_factor', "FT": 'event_type', "HU": 'hum_factor', "IN":
"nature_of_inj", "BD": "part_of_body", "SO": "src_of_injury", "TASK": 'task_assigned'})
# replace numbers in injury data with string values of what the #'s represent
iterator = 0
for item in mapList:
code = code_list[iterator]
col_name = code_to_column[code]
for key, value in item.items():
osha_accident_injury[col_name].replace({key: value})
iterator += 1
osha_accident_injury.csv(前 10 行):
FIELD1 | summary_nr | rel_insp_nr | age | sex | nature_of_inj | part_of_body | src_of_injury | event_type | evn_factor | hum_factor | occ_code | degree_of_inj | task_assigned | hazsub | const_op | const_op_cause | fat_cause | fall_distance | fall_ht | injury_line_nr | load_dt |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 18 | 10006732 | 0 | 10.0 | 12.0 | 15.0 | 13.0 | 18.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1 | 2017-03-20 01:00:11 EDT | ||||
1 | 26 | 159996 | 0 | 21.0 | 19.0 | 42.0 | 5.0 | 13.0 | 9.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1 | 2017-03-20 01:00:11 EDT | ||||
2 | 34 | 10013225 | 0 | 21.0 | 4.0 | 19.0 | 8.0 | 18.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0270 | 0.0 | 0.0 | 0.0 | 1 | 2017-03-20 01:00:11 EDT | |||
3 | 42 | 10014439 | 0 | 1.0 | 10.0 | 24.0 | 2.0 | 3.0 | 1.0 | 0.0 | 2.0 | 2.0 | 0.0 | 0.0 | 0.0 | 1 | 2017-03-20 01:00:11 EDT | ||||
4 | 59 | 19523588 | 0 | 5.0 | 4.0 | 16.0 | 10.0 | 9.0 | 1.0 | 0.0 | 2.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1 | 2017-03-20 01:00:11 EDT | ||||
5 | 59 | 19523588 | 0 | 21.0 | 5.0 | 16.0 | 8.0 | 9.0 | 14.0 | 0.0 | 2.0 | 2.0 | 0.0 | 0.0 | 0.0 | 2 | 2017-03-20 01:00:11 EDT | ||||
6 | 59 | 19523588 | 0 | 21.0 | 5.0 | 16.0 | 6.0 | 9.0 | 14.0 | 0.0 | 2.0 | 2.0 | 0.0 | 0.0 | 0.0 | 3 | 2017-03-20 01:00:11 EDT | ||||
7 | 59 | 19523588 | 0 | 21.0 | 5.0 | 16.0 | 8.0 | 9.0 | 14.0 | 0.0 | 2.0 | 2.0 | 0.0 | 0.0 | 0.0 | 4 | 2017-03-20 01:00:11 EDT | ||||
8 | 59 | 19523588 | 0 | 21.0 | 5.0 | 16.0 | 8.0 | 9.0 | 14.0 | 0.0 | 2.0 | 2.0 | 0.0 | 0.0 | 0.0 | 5 | 2017-03-20 01:00:11 EDT | ||||
9 | 59 | 19523588 | 0 | 21.0 | 5.0 | 16.0 | 8.0 | 9.0 | 14.0 | 0.0 | 2.0 | 2.0 | 0.0 | 0.0 | 0.0 | 6 | 2017-03-20 01:00:11 EDT |
osha_accident_lookup.csv(前 10 行):
accident_code | accident_number | accident_value | accident_letter | load_date |
---|---|---|---|---|
OPER | 1 | Backfilling and compacting | 2018-11-09 20:56:02 EST | |
OPER | 2 | Bituminous concrete placement | 2018-11-09 20:56:02 EST | |
OPER | 3 | Construction of playing fields, tennis courts | 2018-11-09 20:56:02 EST | |
SO | 1 | AIRCRAFT | 2018-11-09 20:56:02 EST | |
SO | 2 | AIR PRESSURE | 2018-11-09 20:56:02 EST | |
SO | 3 | ANIMAL/INS/REPT/ETC. | 2018-11-09 20:56:02 EST | |
OCC | 757 | Separating, filtering & clarifying mach. operators | 2018-11-09 20:56:02 EST | |
OCC | 758 | Compressing and compacting machine operators | 2018-11-09 20:56:02 EST | |
OCC | 759 | Painting and paint spraying machine operators | 2018-11-09 20:56:02 EST | |
OCC | 763 | Roasting and baking machine operators, food | 2018-11-09 20:56:02 EST |
osha_data_dictionary.csv(前 10 行):
table_name | column_name | attribute_name | definition | column_datatype | display_name |
---|---|---|---|---|---|
osha_accident | nonbuild_ht | Non Building Height | Construction - height in feet when not a building | Numeric, Length=4 | Height for Non-Building |
osha_accident | project_type | Project Type | Construction - project type (code table PTYP) | Alphanumeric, Length:1 | Project Type |
osha_accident | event_date | Event Date | Date of accident (yyyymmdd) | Numeric, Length=8 | Event Date |
osha_accident | event_keyword | Event Keyword | Contains comma separated keywords entered by ERG during the review process. | Alphanumeric, Length:200 | Event Keyword |
osha_accident | report_id | Report ID | Identifies the OSHA federal or state reporting jurisdiction | Numeric, Length=7 | Reporting ID |
osha_accident | event_desc | Event Description | Short description of event | Alphanumeric, Length:60 | Event Description |
osha_accident | load_dt | Load Date Timestamp | The date the load was completed. | date | No Label |
osha_accident | summary_nr | Summary NR | Identifies the accident OSHA-170 form | Numeric, Length=9 | Summary NR |
osha_accident | fatality | Fatality | X=Fatality is associated with accident | Alphanumeric, Length:1 | Fatality |
根据您的示例尝试此方法。
df1['A'] = df1['A'].map(newVals)