如何在遍历列表列表时 Return 一个不同的值

How Do I Return a Different Value When Iterating Over a List of Lists

问题

我有一个 FOR 循环,它创建一个列表列表,其中每个条目都包含输入和关联的输出。我不知道如何在创建列表和 return 相应输入后迭代输出。我能够通过将列表转换为数据框并使用 .loc[] 来解决我的问题,但我很顽固并且想要产生相同的结果而不必执行转换为一个数据框。我也不想把它转换成字典,我也已经解决了这种情况。

我已经包含了生成的列表以及有效的转换数据框。在这种情况下 best_tree_size 应该 return 100 因为它的输出是最小结果。

当前可用的代码

    candidate_max_leaf_nodes = [5, 25, 50, 100, 250, 500]

    #list placeholder for loop calculation
    leaf_list = []

    #Write loop to find the ideal tree size from candidate_max_leaf_nodes
    for max_leaf_nodes in candidate_max_leaf_nodes:
        #each iteration outputs a 2 item list [leaf, MAE], which appends to leaf_list as an array
        leaf_list.append([max_leaf_nodes, get_mae(max_leaf_nodes, train_X, val_X, train_y, val_y)])

    #convert array into dataframe
    scores = pd.DataFrame(leaf_list, columns =['Leaf', 'MAE'])

    #Store the best value of max_leaf_nodes (it will be either 5, 25, 50, 100, 250 or 500)
    #idxmin() is finding the min value of MAE and returning the dataframe index
    #.loc is utilizing the index from idxmin() and returning the corresponding value from Leaf that caused it
    best_tree_size = scores.loc[scores.MAE.idxmin(), 'Leaf']

    #clear list placeholder (if needed)
    leaf_list.clear()

已制作 leaf_list

[[5, 35044.51299744237],

[25, 29016.41319191076],

[50, 27405.930473214907],

[100, 27282.50803885739],

[250, 27893.822225701646],

[500, 29454.18598068598]]

CONVERTED scores DATAFRAME

所以你有一个 [leaf, MAE] 的列表,你想从该列表中获取具有最小 MAE 的项目? 你可以这样做:

scores = [
[5, 35044.51299744237],
[25, 29016.41319191076],
[50, 27405.930473214907],
[100, 27282.50803885739],
[250, 27893.822225701646],
[500, 29454.18598068598]
]

from operator import itemgetter
best_leaf, best_mae = min(scores, key=itemgetter(1))

# beaf_leaf will be equal to 100, best_mae will be equal to 27282.50803885739

这里的关键是 itemgetter(1),其中 returns 一种方法,当传递元组或列表时,returns 索引 1 处的元素(此处为 MAE)。 我们将其用作 min() 的键,以便根据元素的 MAE 值比较元素。

Numpy 风格:

leaf_list = [
[5, 35044.51299744237],
[25, 29016.41319191076],
[50, 27405.930473214907],
[100, 27282.50803885739],
[250, 27893.822225701646],
[500, 29454.18598068598]
]
# to numpy
leaf_list = np.array(leaf_list)
# reduce dimension
flatten = leaf_list.flatten()
# def. cond. (check every second item (output) and find min value index
index = np.where(flatten == flatten[1::2].min())[0]//2
# output list
out_list = leaf_list[index]

输出:

array([[  100.        , 27282.50803886]])

还有多个最小值(相同的数字):

leaf_list = [[14,  6],  
             [25,  55],   
             [5,   6]]

#... same code

输出:

array([[14,  6],
       [ 5,  6]])