pandas系列按索引子设置

Question

这是我的例子：

import pandas as pd
df = pd.DataFrame({'col_1':[1,5,6,77,9],'col_2':[6,2,4,2,5]})
df.index = [8,9,10,11,12]

此子设置是按行顺序排列的：

df.col_1[2:5]

returns

10     6
11    77
12     9
Name: col_1, dtype: int64

虽然此子集已经按索引且不起作用：

df.col_1[2]

returns:

KeyError: 2

我觉得很困惑，很好奇背后的原因是什么？

Answer 1

您的陈述含糊不清，因此最好明确定义您想要的内容。

df.col_1[2:5] 的工作方式类似于 df.col_1.iloc[2:5] 使用整数位置。

df.col[2] 的工作方式与 df.col_1.loc[2] 一样，使用索引标签位置，因此没有标记为 2 的索引，因此您得到 KeyError。

因此最好定义是使用 .iloc 的整数位置还是使用 .loc 的索引标签位置。

参见 Pandas Indexing docs。

Answer 2

让我们假设这是初始 DataFrame：

df = pd.DataFrame(
    {
        'col_1':[1, 5, 6, 77, 9], 
        'col_2':[6, 2, 4, 2, 5]
        }, 
    index=list('abcde')
    )

df
Out: 
   col_1  col_2
a      1      6
b      5      2
c      6      4
d     77      2
e      9      5

索引由字符串组成，因此通常很明显您要做什么：

df['col_1']['b'] 您传递了一个字符串，因此您可能正试图通过标签访问。它 returns 5.
df['col_1'][1] 您传递了一个整数，因此您可能正在尝试按位置访问。它 returns 5.
切片处理相同：df['col_1']['b':'d'] 使用标签，df['col_1'][1:4] 使用位置。

当索引也是整数时，什么都不明显了。

df = pd.DataFrame(
    {
        'col_1':[1, 5, 6, 77, 9], 
        'col_2':[6, 2, 4, 2, 5]
        }, 
    index=[8, 9, 10, 11, 12]
    )

df
Out: 
    col_1  col_2
8       1      6
9       5      2
10      6      4
11     77      2
12      9      5

假设您键入 df['col_1'][8]。您是要按标签还是按位置访问？如果它是一片呢？没人知道。此时，pandas根据用途选择其中之一。它最终是一个系列，系列与数组的区别在于它的标签，因此 df['col_1'][8] 的选择是标签。使用标签切片并不常见，因此 pandas 在这里很聪明，并在您传递切片时使用位置。它不一致吗？是的。你应该避免它吗？是的。这是 ix 被弃用的主要原因。

显式优于隐式，因此在有歧义的情况下使用 iloc 或 loc。如果您尝试按位置访问项目，loc 将始终引发 KeyError，而如果您尝试按标签访问，iloc 将始终引发 KeyError。

pandas系列按索引子设置

pandas series sub-setting by index

python

subset

series

pandas