使用 pandas 读取 csv。为土壤类型重新分配值 1 到 40，为森林覆盖类型重新分配值 1 到 7

Question

在 csv 输入文件中，下面有 56 列。示例数据如下所示。请容忍我的格式。

Id,Elevation,Aspect,Slope,Horizontal_Distance_To_Hydrology,Vertical_Distance_To_Hydrology,Horizontal_Distance_To_Roadways,Hillshade_9am,Hillshade_Noon,Hillshade_3pm,Horizontal_Distance_To_Fire_Points,Wilderness_Area1,Wilderness_Area2,Wilderness_Area3,Wilderness_Area4,Soil_Type1,Soil_Type2,Soil_Type3,Soil_Type4,Soil_Type5,Soil_Type6,Soil_Type7,Soil_Type8,Soil_Type9,Soil_Type10,Soil_Type11,Soil_Type12,Soil_Type13,Soil_Type14,Soil_Type15,Soil_Type16,Soil_Type17,Soil_Type18,Soil_Type19,Soil_Type20,Soil_Type21,Soil_Type22,Soil_Type23,Soil_Type24,Soil_Type25,Soil_Type26,Soil_Type27,Soil_Type28,Soil_Type29,Soil_Type30,Soil_Type31,Soil_Type32,Soil_Type33,Soil_Type34,Soil_Type35,Soil_Type36,Soil_Type37,Soil_Type38,Soil_Type39,Soil_Type40,Cover_Type
1,2596,51,3,258,0,510,221,232,148,6279,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,5
2,2590,56,2,212,-6,390,220,235,151,6225,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,5
3,2804,139,9,268,65,3180,234,238,135,6121,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2
4,2785,155,18,242,118,3090,238,238,122,6211,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,2
5,2595,45,2,153,-1,391,220,234,150,6172,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,5
6,2579,132,6,300,-15,67,230,237,140,6031,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,2
7,2606,45,7,270,5,633,222,225,138,6256,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,5
8,2605,49,4,234,7,573,222,230,144,6228,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,5
9,2617,45,9,240,56,666,223,221,133,6244,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,5
10,2612,59,10,247,11,636,228,219,124,6230,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,5

我需要转换该数据。以下是要求。 - 删除具有二进制值（0 或 1）的多个列并为新列分配值范围。对于 Wilderness_Ares，它是从 1 到 4。对于 Soil_Types，它是从 1 到 40。

删除列 Wilderness_Area1 到 Wilderness_Area4。添加一列 Wilderness_Area。根据输入行分配 1 到 4。示例 - 以前，上面示例输入中的第一行有
- Wilderness_Area1 = 1，现在应该是Wilderness_Area = 1.
- Wilderness_Area2 = 1，现在应该是Wilderness_Area = 2.
- Wilderness_Area3 = 1，现在应该是Wilderness_Area = 3.
- Wilderness_Area4 = 1，现在应该是Wilderness_Area = 4.
  1. 删除列 Soil_Type1 到 Soil_Type40。添加一列 Soil_Type。根据输入行分配 1 到 40。示例 - 之前，上面示例输入中的第一行有
- Soil_Type1 = 1，现在应该是Soil_Type = 1.
- Soil_Type2 = 1，现在应该是Soil_Type = 2.
- Soil_Type3 = 1，现在应该是Soil_Type = 3.
- Soil_Type4 = 1，现在应该是Soil_Type = 4.

我使用了以下代码，但我的数据框中仍然有 40 种土壤类型。我需要从 df 中删除这些列。我该怎么做？

df = pandas.read_csv(ifname)
df['Soil'] = 0
for i in range(1,41):
    df['Soil'] = df['Soil'] + i*df['Soil_Type'+str(i)]

print(df)

下面是我需要的例子

Id,Elevation,Aspect,Slope,Horizontal_Distance_To_Hydrology,Vertical_Distance_To_Hydrology,Horizontal_Distance_To_Roadways,Hillshade_9am,Hillshade_Noon,Hillshade_3pm,Horizontal_Distance_To_Fire_Points,Cover_Type,Soil,Wilderness_Area
1,2596,51,3,258,0,510,221,232,148,6279,5,29,1
2,2590,56,2,212,-6,390,220,235,151,6225,5,29,1
3,2804,139,9,268,65,3180,234,238,135,6121,2,12,1
4,2785,155,18,242,118,3090,238,238,122,6211,2,30,1
5,2595,45,2,153,-1,391,220,234,150,6172,5,29,1
6,2579,132,6,300,-15,67,230,237,140,6031,2,29,1
7,2606,45,7,270,5,633,222,225,138,6256,5,29,1
8,2605,49,4,234,7,573,222,230,144,6228,5,29,1
9,2617,45,9,240,56,666,223,221,133,6244,5,29,1
10,2612,59,10,247,11,636,228,219,124,6230,5,29,1

Answer 1

您差不多搞定了，只需在分配值后删除列即可：

In [158]:

soil_type_cols = [col for col in df if 'Soil_Type' in col]
wilderness_cols = [col for col in df if 'Wilderness_Area' in col]

for i in range(1,41):
    df['Soil'] = i*df['Soil_Type'+str(i)]

for i in range(1,5):
    df['Wilderness_Area'] = i*df['Wilderness_Area'+str(i)]

df = df.drop(soil_type_cols+wilderness_cols, axis=1)
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Data columns (total 14 columns):
Id                                    10 non-null int64
Elevation                             10 non-null int64
Aspect                                10 non-null int64
Slope                                 10 non-null int64
Horizontal_Distance_To_Hydrology      10 non-null int64
Vertical_Distance_To_Hydrology        10 non-null int64
Horizontal_Distance_To_Roadways       10 non-null int64
Hillshade_9am                         10 non-null int64
Hillshade_Noon                        10 non-null int64
Hillshade_3pm                         10 non-null int64
Horizontal_Distance_To_Fire_Points    10 non-null int64
Cover_Type                            10 non-null int64
Soil                                  10 non-null int64
Wilderness_Area                       10 non-null int64
dtypes: int64(14)
memory usage: 1.2 KB

使用 pandas 读取 csv。为土壤类型重新分配值 1 到 40，为森林覆盖类型重新分配值 1 到 7

Reading csv using pandas. Reassigning values 1 to 40 for soil types, and 1 to 7 for forest cover types

python

csv

pandas