从 pandas 数据框中获取数组的子集

Question

我有一个名为 arr 的 numpy 数组，其中包含 1154 个元素。

array([502, 502, 503, ..., 853, 853, 853], dtype=int64)

我有一个名为 df

的数据框

    team    Count
0   512     11
1   513     21
2   515     18
3   516     8
4   517     4

如何获取仅包含数组 arr

中的值的数据框 df 的子集

例如：

team         count
arr1_value1    45
arr1_value2    67

为了让这个问题更清楚：我有一个 numpy 数组 ['45', '55', '65']

我有一个数据框如下：

team  count
34      156
45      189
53       90
65       99
23       77
55       91

我需要一个新的数据框如下：

team    count
 45      189
 55       91
 65       99

Answer 1

可以使用DataFrame.loc方法

使用您的示例（注意 team 是索引）：

arr = np.array(['45', '55', '65'])
frame = pd.DataFrame([156, 189, 90, 99, 77, 91], index=['34', '45', '53', '65', '23', '55'])
ans = frame.loc[arr]

这种索引是类型敏感的，所以如果 frame.index 是 int 那么请确保您的索引数组也是 int 类型，而不是像本例中的 str 类型。

Answer 2

我正在回答 "To make this question more clear" 之后提出的问题。附带说明：前 4 行可以由您提供，所以我不必自己输入它们，这也可以引入 errors/misunderstanding.

我们的想法是创建一个系列作为索引，然后根据该索引简单地创建一个新的数据框。我刚开始 pandas，也许这可以更有效地完成。

import numpy as np
import pandas as pd

# starting with the df and teams as string
df = pd.DataFrame(data={'team': [34, 45, 53, 65, 23, 55], 'count': [156, 189, 90, 99, 77, 91]})
teams = np.array(['45', '55', '65'])

# we want the team number as int
teams_int = [int(t) for t in teams]

# mini function to check, if the team is to be kept
def filter_teams(x):
    return True if x in teams_int else False

# create the series as index and only keep those values from our original df
index = df['team'].apply(filter_teams)
df_filtered = df[index]

它returns这个数据框：

count  team
1    189    45
3     99    65
5     91    55

请注意，在这种情况下，df_filtered 使用 1、3、5 作为索引（原始数据帧的索引）。您的问题不清楚，因为索引没有显示给我们。

Answer 3

我不知道这是否是错字，你的数组值看起来像字符串，假设它不是，它们实际上是整数，那么你可以通过调用 isin 来过滤你的 df:

In [6]:

a = np.array([45, 55, 65])
df[df.team.isin(a)]
Out[6]:
   team  count
1    45    189
3    65     99
5    55     91

从 pandas 数据框中获取数组的子集

getting a subset of arrays from a pandas data frame

python

numpy

python-2.7

pandas