R:根据现有数据框的多个条件添加数据列和行
R: Adding columns and rows of data based on multiple conditions on a existing dataframe
我想重组我的土地利用分类数据框,并根据数据框的条件添加新的行和列。我一直在使用 dplyr 来尝试这个,但是我发现的示例倾向于减少列或行,而不是根据关闭条件增加行数。我试图遍历数据集以添加行,但想知道在 dplry 中是否有更好的方法来做到这一点?我也愿意使用不同的库,但它有一个非常大的分类数据集,dplyr 似乎与数据框配合得很好?
这是我当前数据框的代码示例 (df_old) 以及我希望它最终成为的样子 (df_new)。
我想做的是,每当 Year1990-2015 更改时,它都会创建一个新行。例子:ID 424,1990年是51,2000年变成21,一直到现在都是21。这意味着 ID 424 的新数据框应该有两行。一个标记为 Start_Year 表示 1990 年土地使用开始时为森林(Landuse = 51),并且在 2000 年发生变化之前一直为森林。由于在 2000 年它是路面,我们假设它在 1999 年仍然是森林并且 End_Year 对于 ID 424 的第一行将是 1999。比 ID 424 出现一个新行,其中 Start_Year 是 2000,因为它更改为路面(Landuse = 21)并且保持 21 直到 End_year (今天)。
为了添加上下文,数据集表示一个区域在城市中的变化情况,其中 Year1990-2015 中的数字用于识别不同的土地利用分类(21 = 路面,24 = 公园,25 = 住宅,51 = 森林, 41 = 农业).
df_old <- data.frame(ID = c(424,426,427,428),
Parameter= c(0.01,0.03,0.03,0.01),
City = c("Abbotsford","Abbotsford","Abbotsford","Abbotsford"),
Area = c(3.12,7.98,2.01,0.48),
Year1990 = c(51,51,51,41),
Year2000 = c(21,51,51,41),
Year2005 = c(21,51,51,25),
Year2010 = c(21,51,51,24),
Year2015 = c(21,51,51,25))
df_new <- data.frame(ID = c(424,424,426,427,428,428,428,428),
Parameter= c(0.01,0.01,0.03,0.03,0.01,0.01,0.01,0.01),
City = c("Abbotsford","Abbotsford","Abbotsford","Abbotsford","Abbotsford","Abbotsford","Abbotsford","Abbotsford"),
Area = c(3.12,3.12,7.98,2.01,0.48,0.48,0.48,0.48),
Start_Year = c(1990,2000,1990,1990,1990,2005,2010,2015),
End_Year = c(1999,"present","present","present",2004,2009,2014,"present"),
Landuse = c("51-51","51-21","51-51","51-51","41-41","41-25","25-24","24-25"))
这就是我希望的最终产品:
此解决方案适用于您的示例数据,但很难确定 'rules' 管理您所需的操作(因此很难知道它是否适用于您的真实数据)。如果这对您的真实数据失败,请使用更多信息编辑您的 post。
library(tidyverse)
df_old <- data.frame(ID = c(424,426,427,428),
Parameter= c(0.01,0.03,0.03,0.01),
City = c("Abbotsford","Abbotsford","Abbotsford","Abbotsford"),
Area = c(3.12,7.98,2.01,0.48),
Year1990 = c(51,51,51,41),
Year2000 = c(21,51,51,41),
Year2005 = c(21,51,51,25),
Year2010 = c(21,51,51,24),
Year2015 = c(21,51,51,25))
df_new <- data.frame(ID = c(424,424,426,427,428,428,428,428),
Parameter= c(0.01,0.01,0.03,0.03,0.01,0.01,0.01,0.01),
City = c("Abbotsford","Abbotsford","Abbotsford","Abbotsford","Abbotsford","Abbotsford","Abbotsford","Abbotsford"),
Area = c(3.12,3.12,7.98,2.01,0.48,0.48,0.48,0.48),
Start = c(1990,2000,1990,1990,1990,2005,2010,2015),
End = c(1999,"present","present","present",2004,2009,2014,"present"),
LU = c("51-51","51-21","51-51","51-51","41-41","41-25","25-24","24-25"))
df_old %>%
pivot_longer(cols = -c(1:4)) %>%
group_by(ID) %>%
mutate(Start = as.numeric(str_extract(name, "\d+"))) %>%
mutate(`LU-LU` = paste(lag(value, default = max(value)), "-", value, sep = "")) %>%
distinct(`LU-LU`, .keep_all = TRUE) %>%
group_by(ID) %>%
filter(value != lag(value, default = 0)) %>%
group_by(ID) %>%
mutate(End = lead(Start, default = NA) - 1,
End = replace_na(End, "present")) %>%
select(c(ID, Parameter, City, Area, Start, End, `LU-LU`))
#> # A tibble: 8 × 7
#> # Groups: ID [4]
#> ID Parameter City Area Start End `LU-LU`
#> <dbl> <dbl> <chr> <dbl> <dbl> <chr> <chr>
#> 1 424 0.01 Abbotsford 3.12 1990 1999 51-51
#> 2 424 0.01 Abbotsford 3.12 2000 present 51-21
#> 3 426 0.03 Abbotsford 7.98 1990 present 51-51
#> 4 427 0.03 Abbotsford 2.01 1990 present 51-51
#> 5 428 0.01 Abbotsford 0.48 1990 2004 41-41
#> 6 428 0.01 Abbotsford 0.48 2005 2009 41-25
#> 7 428 0.01 Abbotsford 0.48 2010 2014 25-24
#> 8 428 0.01 Abbotsford 0.48 2015 present 24-25
由 reprex package (v2.0.1)
于 2021-12-03 创建
我想重组我的土地利用分类数据框,并根据数据框的条件添加新的行和列。我一直在使用 dplyr 来尝试这个,但是我发现的示例倾向于减少列或行,而不是根据关闭条件增加行数。我试图遍历数据集以添加行,但想知道在 dplry 中是否有更好的方法来做到这一点?我也愿意使用不同的库,但它有一个非常大的分类数据集,dplyr 似乎与数据框配合得很好?
这是我当前数据框的代码示例 (df_old) 以及我希望它最终成为的样子 (df_new)。
我想做的是,每当 Year1990-2015 更改时,它都会创建一个新行。例子:ID 424,1990年是51,2000年变成21,一直到现在都是21。这意味着 ID 424 的新数据框应该有两行。一个标记为 Start_Year 表示 1990 年土地使用开始时为森林(Landuse = 51),并且在 2000 年发生变化之前一直为森林。由于在 2000 年它是路面,我们假设它在 1999 年仍然是森林并且 End_Year 对于 ID 424 的第一行将是 1999。比 ID 424 出现一个新行,其中 Start_Year 是 2000,因为它更改为路面(Landuse = 21)并且保持 21 直到 End_year (今天)。
为了添加上下文,数据集表示一个区域在城市中的变化情况,其中 Year1990-2015 中的数字用于识别不同的土地利用分类(21 = 路面,24 = 公园,25 = 住宅,51 = 森林, 41 = 农业).
df_old <- data.frame(ID = c(424,426,427,428),
Parameter= c(0.01,0.03,0.03,0.01),
City = c("Abbotsford","Abbotsford","Abbotsford","Abbotsford"),
Area = c(3.12,7.98,2.01,0.48),
Year1990 = c(51,51,51,41),
Year2000 = c(21,51,51,41),
Year2005 = c(21,51,51,25),
Year2010 = c(21,51,51,24),
Year2015 = c(21,51,51,25))
df_new <- data.frame(ID = c(424,424,426,427,428,428,428,428),
Parameter= c(0.01,0.01,0.03,0.03,0.01,0.01,0.01,0.01),
City = c("Abbotsford","Abbotsford","Abbotsford","Abbotsford","Abbotsford","Abbotsford","Abbotsford","Abbotsford"),
Area = c(3.12,3.12,7.98,2.01,0.48,0.48,0.48,0.48),
Start_Year = c(1990,2000,1990,1990,1990,2005,2010,2015),
End_Year = c(1999,"present","present","present",2004,2009,2014,"present"),
Landuse = c("51-51","51-21","51-51","51-51","41-41","41-25","25-24","24-25"))
这就是我希望的最终产品:
此解决方案适用于您的示例数据,但很难确定 'rules' 管理您所需的操作(因此很难知道它是否适用于您的真实数据)。如果这对您的真实数据失败,请使用更多信息编辑您的 post。
library(tidyverse)
df_old <- data.frame(ID = c(424,426,427,428),
Parameter= c(0.01,0.03,0.03,0.01),
City = c("Abbotsford","Abbotsford","Abbotsford","Abbotsford"),
Area = c(3.12,7.98,2.01,0.48),
Year1990 = c(51,51,51,41),
Year2000 = c(21,51,51,41),
Year2005 = c(21,51,51,25),
Year2010 = c(21,51,51,24),
Year2015 = c(21,51,51,25))
df_new <- data.frame(ID = c(424,424,426,427,428,428,428,428),
Parameter= c(0.01,0.01,0.03,0.03,0.01,0.01,0.01,0.01),
City = c("Abbotsford","Abbotsford","Abbotsford","Abbotsford","Abbotsford","Abbotsford","Abbotsford","Abbotsford"),
Area = c(3.12,3.12,7.98,2.01,0.48,0.48,0.48,0.48),
Start = c(1990,2000,1990,1990,1990,2005,2010,2015),
End = c(1999,"present","present","present",2004,2009,2014,"present"),
LU = c("51-51","51-21","51-51","51-51","41-41","41-25","25-24","24-25"))
df_old %>%
pivot_longer(cols = -c(1:4)) %>%
group_by(ID) %>%
mutate(Start = as.numeric(str_extract(name, "\d+"))) %>%
mutate(`LU-LU` = paste(lag(value, default = max(value)), "-", value, sep = "")) %>%
distinct(`LU-LU`, .keep_all = TRUE) %>%
group_by(ID) %>%
filter(value != lag(value, default = 0)) %>%
group_by(ID) %>%
mutate(End = lead(Start, default = NA) - 1,
End = replace_na(End, "present")) %>%
select(c(ID, Parameter, City, Area, Start, End, `LU-LU`))
#> # A tibble: 8 × 7
#> # Groups: ID [4]
#> ID Parameter City Area Start End `LU-LU`
#> <dbl> <dbl> <chr> <dbl> <dbl> <chr> <chr>
#> 1 424 0.01 Abbotsford 3.12 1990 1999 51-51
#> 2 424 0.01 Abbotsford 3.12 2000 present 51-21
#> 3 426 0.03 Abbotsford 7.98 1990 present 51-51
#> 4 427 0.03 Abbotsford 2.01 1990 present 51-51
#> 5 428 0.01 Abbotsford 0.48 1990 2004 41-41
#> 6 428 0.01 Abbotsford 0.48 2005 2009 41-25
#> 7 428 0.01 Abbotsford 0.48 2010 2014 25-24
#> 8 428 0.01 Abbotsford 0.48 2015 present 24-25
由 reprex package (v2.0.1)
于 2021-12-03 创建