Pandas read_csv skiprows with conditional statements

Pandas read_csv skiprows with conditional statements

我有一堆 txt 文件需要编译成一个主文件。我用read_csv提取里面的信息。有一些行要删除,我想知道是否可以在不指定要删除的行的索引号的情况下使用 skiprows 功能,而是根据其行告诉删除哪一行 content/value。下面是数据如何说明我的观点。

Index     Column 1          Column 2
0         Rows to drop      Rows to drop
1         Rows to drop      Rows to drop
2         Rows to drop      Rows to drop
3         Rows to keep      Rows to keep
4         Rows to keep      Rows to keep
5         Rows to keep      Rows to keep
6         Rows to keep      Rows to keep
7         Rows to drop      Rows to drop
8         Rows to drop      Rows to drop
9         Rows to keep      Rows to keep
10        Rows to drop      Rows to drop
11        Rows to keep      Rows to keep
12        Rows to keep      Rows to keep
13        Rows to drop      Rows to drop
14        Rows to drop      Rows to drop
15        Rows to drop      Rows to drop

最有效的方法是什么?

没有。 skiprows 不允许您根据行 content/value.

删除

基于Pandas Documentation

skiprows : list-like, int or callable, optional
Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file. If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise. An example of a valid callable argument would be lambda x: x in [0, 2].

因为你不能使用 skiprows 做到这一点,我认为这种方式很有效:

df = pd.read_csv(filePath)

df = df.loc[df['column1']=="Rows to keep"]

这就是你想要达到的效果吗:

import pandas as pd
df = pd.DataFrame({'A':['row 1','row 2','drop row','row 4','row 5',
                        'drop row','row 6','row 7','drop row','row 9']})

df1 = df[df['A']!='drop row']

print (df)
print (df1)

原始数据框:

          A
0     row 1
1     row 2
2  drop row
3     row 4
4     row 5
5  drop row
6     row 6
7     row 7
8  drop row
9     row 9

删除行的新 DataFrame:

       A
0  row 1
1  row 2
3  row 4
4  row 5
6  row 6
7  row 7
9  row 9

虽然不能根据内容跳过行,但可以根据索引跳过行。以下是一些供您选择的选项:

跳过 n 行:

df = pd.read_csv('xyz.csv', skiprows=2)
#this will skip 2 rows from the top

跳过特定行:

df = pd.read_csv('xyz.csv', skiprows=[0,2,5])
#this will skip rows 1, 3, and 6 from the top
#remember row 0 is the 1st line

跳过文件中的第 n 行

#you can also skip by counts. 
#In below example, skip 0th row and every 5th row from there on

def check_row(a):
    if a % 5 == 0:
        return True
    return False

df = pd.read_csv('xyz.txt', skiprows= lambda x:check_row(x))

可以在关于 skip rows

的 link 中找到更多详细信息