在数据框的一列中，计算列表中以 "a" 开头的元素数

Question

我有一个像

这样的数据集

data = {'ID':  ['first_value', 'second_value', 'third_value',
                'fourth_value', 'fifth_value', 'sixth_value'],
        'list_id': [['001', 'ab0', '44A'], [], ['005', '006'],
                    ['a22'], ['azz'], ['aaa', 'abd']]
        }
df = pd.DataFrame(data)

我想创建两列：

统计 'list_id'
统计“list_id”上不以“a”开头的元素数量的列

我正在考虑做类似的事情：

data['list_id'].apply(lambda x: for entity in x if x.startswith("a")

我想先数以a开头的，再数不以a开头的，于是就这样做了：

sum(1 for w in data["list_id"] if w.startswith('a'))

此外，这并没有真正起作用，我无法让它起作用。
有任何想法吗？ :)

Answer 1

假设输入：

             ID          list_id
0   first_value  [001, ab0, 44A]
1  second_value               []
2   third_value       [005, 006]
3  fourth_value            [a22]
4   fifth_value            [azz]
5   sixth_value       [aaa, abd]

您可以使用：

sum(1 for l in data['list_id'] for x in l if x.startswith('a'))

输出：5

如果您想每行计数：

df['starts_with_a'] = [sum(x.startswith('a') for x in l) for l in df['list_id']]
df['starts_with_other'] = df['list_id'].str.len()-df['starts_with_a']

注意。使用列表理解比 apply

更快

输出：

             ID          list_id  starts_with_a  starts_with_other
0   first_value  [001, ab0, 44A]              1                  2
1  second_value               []              0                  0
2   third_value       [005, 006]              0                  2
3  fourth_value            [a22]              1                  0
4   fifth_value            [azz]              1                  0
5   sixth_value       [aaa, abd]              2                  0

Answer 2

使用 pandas 与您的提案非常相似的东西：

data = {'ID':  ['first_value', 'second_value', 'third_value', 'fourth_value', 'fifth_value', 'sixth_value'],
        'list_id': [['001', 'ab0', '44A'], [], ['005', '006'], ['a22'], ['azz'], ['aaa', 'abd']]
        }

df = pd.DataFrame(data)

df["len"] = df.list_id.apply(len)

df["num_a"] = df.list_id.apply(lambda s: sum(map(lambda x: x[0] == "a", s)))
df["num_not_a"] = df["len"] - df["num_a"]

在数据框的一列中，计算列表中以 "a" 开头的元素数

In a column of a dataframe, count number of elements on list starting by "a"

python

dictionary

list

dataframe

pandas