Csvkit 库使用

Question

我希望将给定的 excel 文件转换为 csv，使用 csvkit 作为库，而不是从命令行。我无法找到有关库使用语法的任何信息。谁能阐明如何为此目的将 csvkit 用作库？

我的测试用例很简单 - 取 input.xlsx 或 input.xls，转换并另存为 output.csv。到目前为止，这是我根据其他地方的建议所做的尝试：

import csvkit

with open('input.xlsx') as csvfile:
    reader = in2csv(csvfile)
    # below is just to test whether the file could be accessed
    for row in reader:
        print(row)

给予

Traceback (most recent call last):
  File "excelconvert.py", line 6, in <module>
    reader = in2csv(csvfile)
NameError: name 'in2csv' is not defined

有一个类似的问题，但答案似乎只是引用了文档，这些文档要么没有出现，要么实际上没有解释库使用语法，它只是列出了类。有一个答案表明语法可能类似于 csv 模块，这是我在上面进行的尝试，但我一无所获。

Answer 1

文档强烈建议这是一个命令行工具，不能在 Python 解释器内部使用。您可以执行类似这样的操作以从命令行将文件转换为 csv（或者您可以将其弹出到 shell 脚本中）：

in2csv your_file.xlsx > your_new_file.csv

如果你想读取文件，就这样做（它与你拥有的类似，但你不需要任何外部模块，只需使用内置 Python）：

with open('input.xlsx') as csvfile:
    reader = csvfile.readlines() # This was the only line of your code I changed
    # below is just to test whether the file could be accessed
    for row in reader:
        print(row)

或者您可以使用 os 模块调用命令行：

# Careful, raw sys call. Use subprocess.Popen 
# if you need to accept untrusted user input here
os.popen("in2csv your_file.xlsx > your_new_file.csv").read()

上面的一个片段可能是您所需要的，但如果您真的想要惩罚，您可以尝试使用解释器内部的 in2csv 文件。以下是您可能会做的事情（在我能找到的文档中没有对此的支持，这只是我在解释器中闲逛）：

>>> from csvkit import in2csv
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name in2csv
>>> import csvkit
>>> help(csvkit)
Help on package csvkit:

NAME
    csvkit

FILE
    c:\python27\lib\site-packages\csvkit\__init__.py

DESCRIPTION
    This module contains csvkit's superpowered alternative to the standard Python
    CSV reader and writer. It can be used as a drop-in replacement for the standard
    module.

    .. warn::

        Since version 1.0 csvkit relies on `agate <http://agate.rtfd.org>`_'s
    CSV reader and writer. This module is supported for legacy purposes only and you
    should migrate to using agate.

PACKAGE CONTENTS
    cleanup
    cli
    convert (package)
    exceptions
    grep
    utilities (package)

因此您无法直接从 csvkit 导入 in2csv（因为它未在 PACKAGE CONTENTS 下列出）。但是，如果您进行一些搜索，您会发现可以从 csvkit.utilities 访问该包。但从这里开始情况只会变得更糟。如果你像上面那样做更多 "help hunting"（即从解释器调用帮助），你会发现 class 被设计为从命令行使用。因此，从解释器内部使用它真的很痛苦。这是一个尝试使用默认值的示例（导致爆炸）：

>>> from csvkit.utilities import in2csv
>>> i = in2csv.In2CSV()
>>> i.main()
usage:  [-h] [-d DELIMITER] [-t] [-q QUOTECHAR] [-u {0,1,2,3}] [-b]
        [-p ESCAPECHAR] [-z FIELD_SIZE_LIMIT] [-e ENCODING] [-S] [-H] [-v]
        [-l] [--zero] [-f FILETYPE] [-s SCHEMA] [-k KEY] [--sheet SHEET]
        [-y SNIFF_LIMIT] [--no-inference]
        [FILE]
: error: You must specify a format when providing data via STDIN (pipe).

看一下 in2csv.py 模块，您必须对 args 进行猴子修补，以使其在解释器内部执行您想要的操作。同样，这不是为在解释器内部使用而设计的，它是为从 cmd 行调用而设计的（因此如果您从 cmd 行调用它，则定义 args ）。好像是这样的运行，但是我没有彻底测试过：

>>> from csvkit.utilities import in2csv
>>> i = in2csv.In2CSV()
>>> from collections import namedtuple
>>> i.args = namedtuple("patched_args", "input_path filetype no_inference")
>>> i.args.input_path = "/path/to/your/file.xlsx"
>>> i.args.no_inference = True
>>> i.args.filetype = None
>>> i.main()

Csvkit 库使用

Csvkit Library Usage

python

csv

excel

csvkit