比较和删除数据框上的列

Question

check here to see the df picture

Python:

我有一个数据框，其中一些类型的列是重复的。我想混合具有相似类型的列，如果它们具有“1”值，则保留该值。

例如 0genero_adventure 的值为“0”，1genero_adventure 的值为“1”，所以我想保留“1”。

不仅对于这些示例 fut 对于整个 table（继续有更多重复的流派列）

提前致谢:)

Answer 1

我会存储流派，循环遍历它们，如果其中一列是 1，则保留 1，否则保留 0。

genres = ["action", "adventure"....]
for col in genres:
    df[col] = np.where(df["0genero_"+col]==1 or df["1genero_"+col]==1, 1, 0]

删除您不需要的其余列

Answer 2

如果我正确理解了您的问题，我认为下面的代码应该能完美地满足您的需求。但是，您需要创建一个包含流派名称的列表。

genre_list = ["genero_Adventure", "genero_Biography", "genero_Comedy"]  #Add all the genre names like this

那么这个循环应该可以完成你的工作：

for genre in genre_list:
   genre_cols_list = []
   genre_cols_list = [col for col in df.columns if genre in col]    #Creates a list containing all the columns with the genre name

   df[genre] = df[genre_cols_list].max(axis= 1)   #Checks if there is a value of 1 at the row level and stores it in a new column with just the genre name
   df.drop(columns = genre_cols_list, axis = 1, inplace = True)   #Deletes all columns with the genre name

比较和删除数据框上的列

compare and delete columns on a dataframe

python

data-cleaning

data-science