从数据框中删除行，其中从第三列开始的每个值都是 0

Question

我正在尝试删除从第三列开始值为 0 的行

我使用了下面的代码，它有效，但我觉得必须有更有效的方法来做到这一点，这是我的数据框：

NRC_lexicon_wide = NRC_lexicon_wide[~((NRC_lexicon_wide['anger'] == 0) & (NRC_lexicon_wide['anticipation'] == 0) 
                                      & (NRC_lexicon_wide['disgust'] == 0) & (NRC_lexicon_wide['fear'] == 0) 
                                      & (NRC_lexicon_wide['negative'] == 0) & (NRC_lexicon_wide['positive'] == 0) 
                                      & (NRC_lexicon_wide['sadness'] == 0) & (NRC_lexicon_wide['surprise'] == 0)
                                      & (NRC_lexicon_wide['trust'] == 0))]

Answer 1

好的，这个怎么样：

import pandas
import numpy


# Create a dataframe from a list of dicts will automatically find the column
df = pandas.DataFrame(pandas.DataFrame([{key: numpy.random.choice([0, 1, 2], p=[0.8, 0.15, 0.05]) for key in ["ColA", "ColB", "ColC", "ColD", "ColE", "ColF"]} for _ in range(50)]))

# Start from this column onwards
start_column = 3

# Get a boolean value for each cell, indicating if the value is larger than 0
larger_than_zero = df.loc[:, df.columns[start_column:]] > 0

# Get the rows for which any value in a cell is larger than 0
any_cell_larger_than_zero = larger_than_zero.any(axis=1)

# Select only the rows that have cells larger than 0
df = df.loc[any_cell_larger_than_zero]

# Or in a single line:
df = df.loc[(df.loc[:, df.columns[3:]] > 0).any(axis=1)]

从数据框中删除行，其中从第三列开始的每个值都是 0

Drop rows from dataframe, where every value from thirdcolumn onwards is 0

python

row

dataframe