Python 中的 DataFrame 切片失败

Question

我想在 Python 中分割我的数据。对我的数据帧进行切片的非常基本的任务向我抛出意外错误。

我的代码是：

import pandas as pd

test_file = pd.read_csv("C:/Users/Lenovo/Desktop/testfile.csv")
test_select = test_file[["Category", "Shop"]]
print(test_select[1,1])

代码print(test_select[1,1])应该显示第二行第二列。

错误信息：

File "pandas_libs\hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: (1, 1)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:/Users/Lenovo/.PyCharmCE2018.1/config/scratches/Dictionary.py", line 8, in print(h_select[1,1]) File "C:\Users\Lenovo\PycharmProjects\mindnotez\venv\lib\site-packages\pandas\core\frame.py", line 2688, in getitem return self._getitem_column(key) File "C:\Users\Lenovo\PycharmProjects\mindnotez\venv\lib\site-packages\pandas\core\frame.py", line 2695, in _getitem_column return self._get_item_cache(key) File "C:\Users\Lenovo\PycharmProjects\mindnotez\venv\lib\site-packages\pandas\core\generic.py", line 2489, in _get_item_cache values = self._data.get(item) File "C:\Users\Lenovo\PycharmProjects\mindnotez\venv\lib\site-packages\pandas\core\internals.py", line 4115, in get loc = self.items.get_loc(item) File "C:\Users\Lenovo\PycharmProjects\mindnotez\venv\lib\site-packages\pandas\core\indexes\base.py", line 3080, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas_libs\index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc File "pandas_libs\index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc File "pandas_libs\hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas_libs\hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: (1, 1)

当我打印 print(test_select.head()) 时，我得到以下输出：

     Category           Shop
0       Jidlo         Albert
1       Jidlo          BILLA
2       Jidlo         Albert
3       Jidlo         Albert
4  Restaurant  Kockafé Freyd

像 print(test_select[1:4]) 一样对数据帧进行切片，打印行 1:3。使用命令 print(test_select[1,1])，我想要第二列，第二行。但是，我收到上面的错误消息。

为什么我会收到 KeyError 异常？ 我错过了什么？

我使用：

Python3.7
PyCharm
Anaconda（已安装）

Answer 1

当你想要切片数据帧时

按行号

df.iloc[[1, 5]] # to get rows 1 and 5

df.iloc[1:6] # to get rows 1 to 5 inclusive

您也可以按如下方式将其缩小到特定列（以避免 chain indexing）

df.iloc[[1, 5], df.columns.get_loc('Shop')]

或多列

df.iloc[[1, 5], df.columns.get_indexer(['Shop', 'Category'])]

按标签索引

# Numeric
df.loc[[1, 5]] # 1 and 5 are considered labels here
df.loc[[1, 5], 'Shop']
df.loc[[1, 5], ['Shop', 'Category']]

# Textual or otherwise
df.set_index('Shop', inplace=True)
df.loc[['BILLA', 'Albert'], 'Category']

Answer 2

The code print(test_select[1,1]) should display the second row of the second column.

不，不应该。 语法 df[x] 通常保留用于检索列（系列）、布尔行索引或行切片。 pd.DataFrame.__getitem__ 的这些用法，其中 df[] 是语法糖，没有方便地记录。一般来说，它们应该被视为快捷方式，如果您不确定应该更喜欢 loc / iloc / at / iat ，视情况而定。

要通过整数位置索引检索标量值，您可以使用 pd.DataFrame.iat:

df.iat[1, 1]

Answer 3

使用 loc 这是使用索引和列而不是位置，这里看起来您的索引是从 0 到 n，因此 loc 等于 iloc 时切片行

df.loc[1,'Shop']
'BILLA'

Answer 4

如果你想要第二行第二列，你必须使用： df.iloc[1,1] iloc根据index

提取数据

[1,1] 采用第一行索引和第一列索引。输出将是 'BILLA'

Python 中的 DataFrame 切片失败

DataFrame slicing in Python fails

python

slice

dataframe

python-3.x

pandas