在数据框中按组操作
Manipulating by groups within data frame
我有一个数据框,看起来或多或少像这样(但更长并且具有更多 ObsDOY 值):
Position ObsDOY Offset Lin_Flux
<chr> <dbl> <dbl> <dbl>
1 Inter-row (unplanted) 122 1 10.7
2 Tree row 122 1 10.3
3 Tree row 122 1 16.2
4 Inter-row (planted) 122 1 9.08
5 Inter-row (trenched) 122 1 3.57
6 Inter-row (trenched) 122 1 12.3
7 Inter-row (trenched) 122 1 9.36
8 Inter-row (trenched) 122 1 7.73
9 Inter-row (trenched) 122 1 10.1
10 Inter-row (trenched) 122 1 7.14
11 Inter-row (planted) 143 1 4.44
我想添加一个新列,它是每个位置的 Lin_Flux
与对应于 Position = "Inter-row (trenched)"
且具有相同 Lin_flux
值的平均值之间的差异 ObsDOY
(一年中的观察日)。换句话说:
newcol <- Lin_flux[ObsDOY=x] - mean(Lin_flux[ObsDOY=x AND Position = "Inter-row (trenched)"])
我知道如何使用 dplyr
包过滤掉子集然后使用它们来手动实现此目的,但我非常感谢有关更优雅解决方案的建议,因为我将不断添加更多观察,不想每次都重写脚本以包含新的 ObsDOY
值。
最好包含一些示例数据(我也必须学习),这样可以更轻松地重现您的问题。根据我的理解,我冒昧地做了你想做的事。
如果这是您的数据:
df <- data.frame(Position = c("A", "A", "B", "B", "B", "C", "C"), ObsDOY = c("Mon", "Mon", "Tue", "Tue", "Mon", "Fri", "Fri"), Lin_Flux = c(2, 3, 5, 2, 4, 1, 1))
这样就达到了你想要的效果。
df <- df %>%
group_by(Position, ObsDOY) %>%
mutate(newcol = Lin_Flux - mean(Lin_Flux))
你在 dplyr
的正确轨道上 - group_by
基本上根据你指定的变量创建小的子样本,随后的一切(例如使用 mean()
)是然后在这些子样本上完成。
您可以使用 dplyr
而无需手动对每个不同的 ObsDOY
值进行子集化。为此,您创建一个新的数据集 Lin_Flux
表示 ObsDOY
的每个值,然后仅过滤掉 Position == "Inter-row (trenched)"
.
的那些观察值
之后,您将数据合并回原始数据集并取差值。
library(dplyr)
df <- tribble(
~ID, ~Position, ~ObsDOY, ~Offset, ~Lin_Flux,
1, "Inter-row (unplanted)", 122, 1, 10.7,
2, "Tree row", 122, 1, 10.3,
3, "Tree row", 122, 1, 16.2,
4, "Inter-row (planted)", 122, 1, 9.08,
5, "Inter-row (trenched)", 122, 1, 3.57,
6, "Inter-row (trenched)", 122, 1, 12.3,
7, "Inter-row (trenched)", 122, 1, 9.36,
8, "Inter-row (trenched)", 122, 1, 7.73,
9, "Inter-row (trenched)", 122, 1, 10.1,
10, "Inter-row (trenched)", 122, 1, 7.14,
11, "Inter-row (planted)", 143, 1, 4.44
)
df %>%
filter(Position == "Inter-row (trenched)") %>%
group_by(ObsDOY) %>%
summarize(Lin_Flux_mean = mean(Lin_Flux)) %>%
right_join(df, by = c("ObsDOY")) %>%
mutate(Lin_Flux_diff = Lin_Flux - Lin_Flux_mean)
# A tibble: 11 x 7
ObsDOY Lin_Flux_mean ID Position Offset Lin_Flux Lin_Flux_diff
<dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 122 8.37 1 Inter-row (unplanted) 1 10.7 2.33
2 122 8.37 2 Tree row 1 10.3 1.93
3 122 8.37 3 Tree row 1 16.2 7.83
4 122 8.37 4 Inter-row (planted) 1 9.08 0.713
5 122 8.37 5 Inter-row (trenched) 1 3.57 -4.80
6 122 8.37 6 Inter-row (trenched) 1 12.3 3.93
7 122 8.37 7 Inter-row (trenched) 1 9.36 0.993
8 122 8.37 8 Inter-row (trenched) 1 7.73 -0.637
9 122 8.37 9 Inter-row (trenched) 1 10.1 1.73
10 122 8.37 10 Inter-row (trenched) 1 7.14 -1.23
11 143 NA 11 Inter-row (planted) 1 4.44 NA
我有一个数据框,看起来或多或少像这样(但更长并且具有更多 ObsDOY 值):
Position ObsDOY Offset Lin_Flux
<chr> <dbl> <dbl> <dbl>
1 Inter-row (unplanted) 122 1 10.7
2 Tree row 122 1 10.3
3 Tree row 122 1 16.2
4 Inter-row (planted) 122 1 9.08
5 Inter-row (trenched) 122 1 3.57
6 Inter-row (trenched) 122 1 12.3
7 Inter-row (trenched) 122 1 9.36
8 Inter-row (trenched) 122 1 7.73
9 Inter-row (trenched) 122 1 10.1
10 Inter-row (trenched) 122 1 7.14
11 Inter-row (planted) 143 1 4.44
我想添加一个新列,它是每个位置的 Lin_Flux
与对应于 Position = "Inter-row (trenched)"
且具有相同 Lin_flux
值的平均值之间的差异 ObsDOY
(一年中的观察日)。换句话说:
newcol <- Lin_flux[ObsDOY=x] - mean(Lin_flux[ObsDOY=x AND Position = "Inter-row (trenched)"])
我知道如何使用 dplyr
包过滤掉子集然后使用它们来手动实现此目的,但我非常感谢有关更优雅解决方案的建议,因为我将不断添加更多观察,不想每次都重写脚本以包含新的 ObsDOY
值。
最好包含一些示例数据(我也必须学习),这样可以更轻松地重现您的问题。根据我的理解,我冒昧地做了你想做的事。
如果这是您的数据:
df <- data.frame(Position = c("A", "A", "B", "B", "B", "C", "C"), ObsDOY = c("Mon", "Mon", "Tue", "Tue", "Mon", "Fri", "Fri"), Lin_Flux = c(2, 3, 5, 2, 4, 1, 1))
这样就达到了你想要的效果。
df <- df %>%
group_by(Position, ObsDOY) %>%
mutate(newcol = Lin_Flux - mean(Lin_Flux))
你在 dplyr
的正确轨道上 - group_by
基本上根据你指定的变量创建小的子样本,随后的一切(例如使用 mean()
)是然后在这些子样本上完成。
您可以使用 dplyr
而无需手动对每个不同的 ObsDOY
值进行子集化。为此,您创建一个新的数据集 Lin_Flux
表示 ObsDOY
的每个值,然后仅过滤掉 Position == "Inter-row (trenched)"
.
之后,您将数据合并回原始数据集并取差值。
library(dplyr)
df <- tribble(
~ID, ~Position, ~ObsDOY, ~Offset, ~Lin_Flux,
1, "Inter-row (unplanted)", 122, 1, 10.7,
2, "Tree row", 122, 1, 10.3,
3, "Tree row", 122, 1, 16.2,
4, "Inter-row (planted)", 122, 1, 9.08,
5, "Inter-row (trenched)", 122, 1, 3.57,
6, "Inter-row (trenched)", 122, 1, 12.3,
7, "Inter-row (trenched)", 122, 1, 9.36,
8, "Inter-row (trenched)", 122, 1, 7.73,
9, "Inter-row (trenched)", 122, 1, 10.1,
10, "Inter-row (trenched)", 122, 1, 7.14,
11, "Inter-row (planted)", 143, 1, 4.44
)
df %>%
filter(Position == "Inter-row (trenched)") %>%
group_by(ObsDOY) %>%
summarize(Lin_Flux_mean = mean(Lin_Flux)) %>%
right_join(df, by = c("ObsDOY")) %>%
mutate(Lin_Flux_diff = Lin_Flux - Lin_Flux_mean)
# A tibble: 11 x 7
ObsDOY Lin_Flux_mean ID Position Offset Lin_Flux Lin_Flux_diff
<dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 122 8.37 1 Inter-row (unplanted) 1 10.7 2.33
2 122 8.37 2 Tree row 1 10.3 1.93
3 122 8.37 3 Tree row 1 16.2 7.83
4 122 8.37 4 Inter-row (planted) 1 9.08 0.713
5 122 8.37 5 Inter-row (trenched) 1 3.57 -4.80
6 122 8.37 6 Inter-row (trenched) 1 12.3 3.93
7 122 8.37 7 Inter-row (trenched) 1 9.36 0.993
8 122 8.37 8 Inter-row (trenched) 1 7.73 -0.637
9 122 8.37 9 Inter-row (trenched) 1 10.1 1.73
10 122 8.37 10 Inter-row (trenched) 1 7.14 -1.23
11 143 NA 11 Inter-row (planted) 1 4.44 NA