Python Pandas - 分组到命名元组列表中

Python Pandas - Group into list of named tuples

我有以下数据

from io import StringIO
import pandas as pd
import collections

stg = """
target predictor  value
10     predictor1     A
10     predictor1     C
10     predictor2     1
10     predictor2     2
10     predictor3     X
20     predictor1     A
20     predictor2     3
20     predictor3     Y
30     predictor1     B
30     predictor2     1
30     predictor3     X
40     predictor1     B
40     predictor2     2
40     predictor2     3
40     predictor3     X
40     predictor3     Y
50     predictor1     C
50     predictor2     3
50     predictor3     Y
60     predictor1     C
60     predictor2     4
60     predictor3     Z
"""

我这样做是为了获得具有相同目标列表的预测变量和值列表:

src = pd.read_csv(StringIO(stg), delim_whitespace=True, dtype=str)

grouped = src.groupby(["predictor","value"])['target'].apply(','.join).reset_index()

print(grouped)

    predictor value    target
0  predictor1     A     10,20
1  predictor1     B     30,40
2  predictor1     C  10,50,60
3  predictor2     1     10,30
4  predictor2     2     10,40
5  predictor2     3  20,40,50
6  predictor2     4        60
7  predictor3     X  10,30,40
8  predictor3     Y  20,40,50
9  predictor3     Z        60

从这里我最终想为代表预测变量和值的每个目标列表创建一个命名元组列表

Predicate = collections.namedtuple('Predicate',('predictor', 'value'))

编辑:

为了澄清,我想创建一个谓词列表,以便在一个单独的过程中,我可以迭代它们并构造查询字符串,如下所示:

#target 10,20
data_frame.query('predictor1="A"')

#target 10,30
data_frame.query('predictor2="1"')

#target 10,30,40
data_frame.query('predictor3="X"')

#target 20,40,50
data_frame.query('predictor2="3" or predictor3="Y"')

我想尝试使用目标列表并像这样创建一个预测变量和值列表

grouped_list = grouped.groupby('target').agg(lambda x: x.tolist())

print(grouped_list)

                         predictor   value
target                                    
10,20                 [predictor1]     [A]
10,30                 [predictor2]     [1]
10,30,40              [predictor3]     [X]
10,40                 [predictor2]     [2]
10,50,60              [predictor1]     [C]
20,40,50  [predictor2, predictor3]  [3, Y]
30,40                 [predictor1]     [B]
60        [predictor2, predictor3]  [4, Z]

这为我提供了 2 列,每列包含一个列表。我可以像这样迭代这些行

for index, row in grouped_list.iterrows():

    print("--------")
    for pred in row["predictor"]:

        print(pred)

但我看不出如何从这里得到这样的东西(这不起作用但希望能说明我的意思):

for index, row in grouped_list.iterrows():

    Predicates=[]
    for pred, val in row["predicate","value"] :

        Predicates.append(Predicate(pred, val))

Traceback (most recent call last):
  File 
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2563, in get_value
return libts.get_value_box(s, key)
  File "pandas/_libs/tslib.pyx", line 1018, in pandas._libs.tslib.get_value_box
  File "pandas/_libs/tslib.pyx", line 1026, in pandas._libs.tslib.get_value_box
TypeError: 'tuple' object cannot be interpreted as an integer

任何指点将不胜感激 - 我是 python 的新手,所以逐步解决问题 - 可能有更好的方法来实现上述目标。

干杯

大卫

我认为你需要list comprehension:

L = [Predicate(x.predictor, x.value) for x in grouped.itertuples()]
print (L)

[Predicate(predictor='predictor1', value='A'), 
 Predicate(predictor='predictor1', value='B'), 
 Predicate(predictor='predictor1', value='C'), 
 Predicate(predictor='predictor2', value='1'), 
 Predicate(predictor='predictor2', value='2'), 
 Predicate(predictor='predictor2', value='3'), 
 Predicate(predictor='predictor2', value='4'), 
 Predicate(predictor='predictor3', value='X'), 
 Predicate(predictor='predictor3', value='Y'), 
 Predicate(predictor='predictor3', value='Z')]

编辑:

d = {k:[Predicate(x.predictor, x.value) for x in v.itertuples()] 
                                                for k,v in grouped.groupby('target')}
print (d)

{'10,30': [Predicate(predictor='predictor2', value='1')], 
 '30,40': [Predicate(predictor='predictor1', value='B')], 
 '20,40,50': [Predicate(predictor='predictor2', value='3'),
              Predicate(predictor='predictor3', value='Y')], 
 '10,30,40': [Predicate(predictor='predictor3', value='X')], 
 '10,40': [Predicate(predictor='predictor2', value='2')], 
 '10,20': [Predicate(predictor='predictor1', value='A')],
 '60': [Predicate(predictor='predictor2', value='4'), 
        Predicate(predictor='predictor3', value='Z')], 
 '10,50,60': [Predicate(predictor='predictor1', value='C')]}