Pandas 数据框中每一行的列表出现频率

Question

假设我有一个包含 6 个整数的列表，名为“base”，还有一个包含 100,000 行和 6 列整数的数据框。

我需要创建一个额外的列来显示列表“base”针对数据帧数据中每一行的出现频率。

在这种情况下，将忽略列表“base”和数据框中的整数序列。

出现频率的取值范围为 0 到 6。
0 表示列表“base”中的所有 6 个整数与数据框中一行的 6 列中的任何列都不匹配。

任何人都可以解释一下吗？

Answer 1

你可以试试这个：

import pandas as pd

# create frame with six columns of ints
df = pd.DataFrame({'a':[1,2,3,4,10],
                   'b':[8,5,3,2,11],
                   'c':[3,7,1,8,8],
                   'd':[3,7,1,8,8],
                   'e':[3,1,1,8,8],
                   'f':[7,7,1,8,8]})

# list of ints
base =[1,2,3,4,5,6]

# define function to count membership of list
def base_count(y):
    return sum(True for x in y if x in base)

# apply the function row wise using the axis =1 parameter
df.apply(base_count, axis=1)

输出：

0    4
1    3
2    6
3    2
4    0
dtype: int64

然后将其分配给新列：

df['g'] = df.apply(base_count, axis=1)

Pandas 数据框中每一行的列表出现频率

Occurence frequency from a list against each row in Pandas dataframe

python

frequency

pandas