未找到列名，可能是因为 'Latin-1' 字符被识别为 utf-8

Question

我在使用 Pandas 库的 Jupyter Notebook 上使用 Python 2.7，我面临以下问题：

我有一个包含带重音符号的字符的数据集，要从 .csv 中提取数据，我编写了以下代码：

datafile = pd.read_csv("exportacionEmitidas.csv", delimiter=";", 
encoding='latin-1', low_memory=False)

这些是我得到的列，对我来说很好：

Nº Serie + Nº Factura
Ejercicio
Periodo
Fecha Expedición
Fecha Operacion 
NIF Destinatario
Nombre o Razón Social Destinatario

但是，当我尝试创建一个仅包含部分列的新数据框时，我收到以下烦人的消息：

  datafile[["Nº Serie + Nº Factura","Fecha Expedición"]]
KeyError: "['N\xc2\xba Serie + N\xc2\xba Factura' 'Fecha Expedici\xc3\xb3n'] not in index"

我不想 select 按列索引排列列，因为我想避免在列的顺序发生变化时犯任何错误。

Answer 1

您的列名称是 Unicode 对象，而不是字节字符串。使用 Unicode 文字（以 u 为前缀）来解决它们：

datafile[[u"Nº Serie + Nº Factura", u"Fecha Expedición"]]

您可以在回显所有列名称时看到这一点：

>>> datafile.columns
Index([u'Nº Serie + Nº Factura', u'Ejercicio', u'Periodo', u'Fecha Expedición',
       u'Fecha Operacion', u'NIF Destinatario',
       u'Nombre o Razón Social Destinatario'],
      dtype='object')

使用相同的 u'...' 字符串文字语法回显每个列名。

请注意，为了能够在此类字符串中使用非 ASCII 字符，您必须 declare a codec at the top of your Python source file:

# coding: UTF-8
# The above states this source file is saved using UTF-8.

您可能想移动到 Python 3。Python 3 更能识别 Unicode，Python 2 将在 18 个月后不再受支持。

未找到列名，可能是因为 'Latin-1' 字符被识别为 utf-8

Column name not found, probably because 'Latin-1' character recognized as utf-8

python

iso-8859-1

python-2.7

pandas