在 Pandas 数据框中循环一周中的几天

Question

我有一个 Pandas DataFrame，其起始列的 dtype 为 datetime64[ns, UTC] 并且 DataFrame 根据起始列按升序排序。从这个 DataFrame 中，我使用以下内容创建了一个新的（更新的）DataFrame，指示开始列的星期几

format_datetime_df['day_of_week'] = format_datetime_df['start'].dt.dayofweek

我想将 DataFrame 传递给一个函数。该函数需要循环一周中的几天，因此从 0 到 6，并保持运行总距离（保留在 'distance' 列中）。如果覆盖的距离大于 15，则计数器递增。它需要对 DataFrame 的所有行执行此操作。函数的 return 将是超过 15 周的总周数。

由于我的 'day_of_week' 专栏开始如下

，我对如何实现它感到困惑

因此，第 1 周将由 3、3、5 组成，而第 2 周将由 1、5、...

我想做类似的事情

number_of_weeks_over_10km = format_datetime_df.groupby().apply(weeks_over_10km)

但我不太确定 groupby() 函数中应该包含什么。我也觉得我把这个复杂化了。

Answer 1

这很复杂，但我想通了。这是我所做的基本流程

# Create a helper index that allows iteration by week while also considering the year

# Function to return the total distance for each week

# Create a NumPy array to store the total distance for each week

# Append the total distance for each week to the array

# Count the number of times the total distance for each week was > x (in km)

允许按周迭代同时考虑年份的辅助索引来自 Stack Overflow () 上的另一个 post。但这有一个后果，因为我必须在函数之外创建和附加 NumPy 数组才能使一切正常工作。

Answer 2

我想你可以使用 Pandas 不带函数来解决这个问题。只需使用

确定年和周

df["isoweek"] = (df["start"].dt.isocalendar()["year"].astype(str)
 + " "
 + df["start"].dt.isocalendar()["week"].astype(str)
)

然后您使用 groupby 确定距离并计算 15 以上的条目：

weeks_above_15 = (df.groupby("isoweek")["distance"].sum() > 15).sum()

在 Pandas 数据框中循环一周中的几天

Loop Through Days of the Week in Pandas Dataframe

python

dataframe

pandas

data-science