pandas read_csv throwing ValueError: Invalid file path or buffer object type: <class 'list'>

pandas read_csv throwing ValueError: Invalid file path or buffer object type: <class 'list'>

我想读取作为命令行参数发送的 csv 文件。以为我可以直接使用 argsprase 的 FileType 对象,但我遇到了错误。

from argparse import ArgumentParser, FileType
from pandas import read_csv

if __name__ == "__main__":
    parser = ArgumentParser()
    parser.add_argument("input_file_path", help="Input CSV file", type=FileType('r'), nargs=1)
    df = read_csv(parser.parse_args().input_file_path, sep="|")
    print(df.to_string())

Pandas read_csv 在我执行下面给出的程序时无法读取 FileType 对象 - 缺少什么?

python csv_splitter.py test.csv

Traceback (most recent call last):
  File "csv_splitter.py", line 7, in <module>
    df = read_csv(parser.parse_args().input_file_path, sep="|")
  File "C:\Users\kakkrah\AppData\Roaming\Python\Python38\site-packages\pandas\io\parsers.py", line 605, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "C:\Users\kakkrah\AppData\Roaming\Python\Python38\site-packages\pandas\io\parsers.py", line 457, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "C:\Users\kakkrah\AppData\Roaming\Python\Python38\site-packages\pandas\io\parsers.py", line 814, in __init__
    self._engine = self._make_engine(self.engine)
  File "C:\Users\kakkrah\AppData\Roaming\Python\Python38\site-packages\pandas\io\parsers.py", line 1045, in _make_engine
    return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
  File "C:\Users\kakkrah\AppData\Roaming\Python\Python38\site-packages\pandas\io\parsers.py", line 1862, in __init__
    self._open_handles(src, kwds)
  File "C:\Users\kakkrah\AppData\Roaming\Python\Python38\site-packages\pandas\io\parsers.py", line 1357, in _open_handles
    self.handles = get_handle(
  File "C:\Users\kakkrah\AppData\Roaming\Python\Python38\site-packages\pandas\io\common.py", line 558, in get_handle
    ioargs = _get_filepath_or_buffer(
  File "C:\Users\kakkrah\AppData\Roaming\Python\Python38\site-packages\pandas\io\common.py", line 371, in _get_filepath_or_buffer
    raise ValueError(msg)
ValueError: Invalid file path or buffer object type: <class 'list'>

pd.read_csv 无法读取文件列表,一次只能读取一个文件。

要将多个文件读入一个数据帧,请使用 pd.concat 和生成器:

df = pd.concat(pd.read_csv(p) for p in paths)

pd.concatmap:

df = pd.concat(map(pd.read_csv, paths))

在 OP 的情况下,即使 nargs=1 将 arg 解析器限制为使用 1 个文件,它仍然 returns 那 1 个文件对象的 list

print(parser.parse_args().input_file_path)
# [ <_io.TextIOWrapper> ]

所以只需索引单个文件:

df = pd.read_csv(parser.parse_args().input_file_path[0])
#                                                   ^^^