如何将列的 count/sum 限制为每月 3 个？

Question

我有这个数据集 df，

ID    Name
23    Dan
24    Bob

此数据集显示每个 ID 的关系

ID    ID2       DATE       Status
23    10     2019-06-11     Sent
23    20     2019-06-21     Sent
23    30     2019-06-26     Sent
23    40     2019-06-27     Sent
23    50     2019-12-02     Sent
24    55     2019-06-27     Sent
24    65     2019-06-29     Sent

此处，ID 23 在上述日期向 ID 10、20、30、40、50 发送了信件。我想知道每个 ID 发送了多少封信。我做了这样的事情，

id = df.groupby(['ID'], as_index = False)
id_dict = {}

all_df = id.get_group(ID)

letter_count = 0
for index, row in all_df.iterrows():
        if ((row['Status'] == 'Sent')):
            letter_count = letter_count + 1

id_dict.update({ID:letter_count})
df['letter_count'] = df['ID'].map(id_dict)

我在 df 上得到这个输出，

ID    Name  letter_count
23    Dan        4
24    Bob        2

不考虑 DATE。我可能需要一个新专栏 MONTH 甚至 YEAR。我需要将发送的信件数量设置为每月 3 封。在这里，6 月 4 日的信件已发送，但我需要每个月的计数保持在 3。这个数字应该是可配置的。

需要新输出：

ID    Name  Month   Year   letter_count
23    Dan    06     2019         3
23    Dan    12     2019         1
24    Bob    06     2019         2

Answer 1

按 ID 分组并获取每个 ID 的总计数。然后如果count > 4，设置为3。

df2 = df.groupby(['ID']).agg({'ID2': count})
df2['ID2'] = np.where(df2['ID2] > 3, 3, df2['ID2'])

Answer 2

您可以在执行 groupby 后应用 clip 以保持计数在 3:

# Add Year/Month and check if Sent columns
df['DATE'] = df.DATE.astype('datetime64')
df.assign(Year=df.DATE.dt.strftime('%Y'), \
          Month=df.DATE.dt.strftime('%m'), \
          Sent=df['Status'].eq('Sent'))

# Your data should look like this at this point:    

   ID  ID2       DATE Status  Year Month  Sent
0  23   10 2019-06-11   Sent  2019    06  True
1  23   20 2019-06-21   Sent  2019    06  True
2  23   30 2019-06-26   Sent  2019    06  True
3  23   40 2019-06-27   Sent  2019    06  True
4  23   50 2019-12-02   Sent  2019    12  True
5  24   55 2019-06-27   Sent  2019    06  True
6  24   65 2019-06-29   Sent  2019    06  True

# Apply the groupby and clip:
new_df = df.groupby(['ID', 'Year', 'Month'])['Sent'].count().clip(upper=3).reset_index()

# Merge back the names:
new_df = new_df.merge(df_name, how='left')

# Which gives you:
   ID  Year Month  Sent Name
0  23  2019    06     3  Dan
1  23  2019    12     1  Dan
2  24  2019    06     2  Bob

我仍然想知道这是否回答了这个练习的真正目的。归根结底，你只是在总结框架中骗自己。

如何将列的 count/sum 限制为每月 3 个？

How do i limit the count/sum of a column to 3 per month?

python

dictionary

python-datetime

pandas

pandas-groupby