pandas 可变列失败

Question

我的文件是这个

    4 7 a a
    s g 6 8 0 d
    g 6 2 1 f 7 9 
    f g 3 
    1 2 4 6 8 9 0

我使用 pandas 以 pandas 对象的形式保存它。但是我收到以下错误
pandas.parser.CParserError: Error tokenizing data. C error: Expected 6 fields in line 3, saw 8

我使用的代码是
file = pd.read_csv("a.txt",dtype = None,delimiter = " ")

任何人都可以提出这样的想法来包含文件吗？

Answer 1

使用pandas这会引发错误，因为该函数期望有一定数量的列，在本例中为 6，但是当它到达第三行时遇到了 8。一种处理方法这是为了不读取列数多于数据帧第一行的行。这可以使用 error_bad_lines 参数来完成。这就是文档所说的 error_bad_lines:

error_bad_lines : boolean, default True Lines with too many fields (e.g. a csv line with too many commas) will by default cause an exception to be raised, and no DataFrame will be returned. If False, then these “bad lines” will dropped from the DataFrame that is returned. (Only valid with C parser)

所以你可以这样做：

>>> file = pd.read_csv("a.txt",dtype = None,delimiter = " ",error_bad_lines=False)
Skipping line 3: expected 6 fields, saw 8
Skipping line 5: expected 6 fields, saw 7

>>> file
     4    7    a  a.1
s g  6  8.0  0.0    d
f g  3  NaN  NaN  NaN

或者您可以使用 skiprows 参数来跳过您想要的行，这就是文档对 skiprows:

的说明

skiprows : list-like or integer, default None Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file

Answer 2

这是一种方法。

In [50]: !type temp.csv
4,7,a,a
s,g,6,8,0,d
g,6,2,1,f,7,9
f,g,3
1,2,4,6,8,9,0

读取 csv 到列表列表，然后转换为 DataFrame。

In [51]: pd.DataFrame([line.strip().split(',') for line in open('temp.csv', 'r')])
Out[51]:
   0  1  2     3     4     5     6
0  4  7  a     a  None  None  None
1  s  g  6     8     0     d  None
2  g  6  2     1     f     7     9
3  f  g  3  None  None  None  None
4  1  2  4     6     8     9     0

pandas 可变列失败

pandas failing with variable columns

python

file

multiple-columns

pandas