R:如果名称包含特定文本,则将其分组
R: If name includes specific text, then group it
我正在使用能源绩效证书数据来识别一个区域中建筑物的取暖燃料类型,但是,它们分为 9 种主要燃料类型的 60 多个不同子集。我想为燃料类型添加另一列,以便它们可以按 9 种主要燃料类型分组。
数据相关列的示例是:
BuildingID <- c(1,2,3,4,5,6,7,8,9,10)
MainHeatDesc <- c("Boiler and radiators, mains gas", "Boiler and radiators, oil", "Room heaters, electric", "Room heaters, LPG", "Air source heat pump, underfloor heating, electric", "Air source heat pump, fan coil units, electric", "Ground source heat pump, mains gas", "Electric storage heaters", "Room heaters, wood logs", "Boilers and radiators, wood chips")
data <- data.frame(BuildingID, MainHeatDesc)
这是一个采用原始数据的某些子集的小示例。在这个例子中,我想为主要燃料类型创建另一个列,将它们分组为:Mains gas、Oil、Electric、LPG 和 wood。
最终结果应该是这样的:
# BuildingID MainHeatDesc MainFuelType
# 1 Boiler and radiators, mains gas Mains gas
# 2 Boiler and radiators, oil Oil
# 3 Room heaters, electric Electric
# 4 Room heaters, LPG LPG
# 5 Air source heat pump, underfloor heating, electric Electric
# 6 Air source heat pump, fan coil units, electric Electric
# 7 Ground source heat pump, mains gas Mains Gas
# 8 Electric storage heaters Electric
# 9 Room heaters, wood logs Wood
# 10 Boilers and radiators, wood chips Wood
如果有人能帮助我,我将不胜感激。如果您有任何疑问或需要更多信息,请告诉我。
谢谢!
一个dplyr
和stringr
选项可以是:
data %>%
mutate(group = str_extract(MainHeatDesc, regex("\bMains gas|\bOil|\bElectric|\bLPG|\bwood", ignore_case = TRUE)))
BuildingID MainHeatDesc group
1 1 Boiler and radiators, mains gas mains gas
2 2 Boiler and radiators, oil oil
3 3 Room heaters, electric electric
4 4 Room heaters, LPG LPG
5 5 Air source heat pump, underfloor heating, electric electric
6 6 Air source heat pump, fan coil units, electric electric
7 7 Ground source heat pump, mains gas mains gas
8 8 Electric storage heaters Electric
9 9 Room heaters, wood logs wood
10 10 Boilers and radiators, wood chips wood
如果你有很多花样,那么你可以这样准备:
x <- paste(paste0("\b", c("Mains gas", "Oil", "Electric", "LPG", "wood"), "\b"), collapse = "|")
data %>%
mutate(group = str_extract(MainHeatDesc, regex(x, ignore_case = TRUE)))
如果你想进一步匹配你的预期输出,那么你可以使用替换向量:
y <- c("Mains gas", "Oil", "Electric", "LPG", "Wood")
data %>%
mutate(group = str_extract(MainHeatDesc, regex(x, ignore_case = TRUE)),
group = str_replace(group, regex(x, ignore_case = TRUE), y))
BuildingID MainHeatDesc group
1 1 Boiler and radiators, mains gas Mains gas
2 2 Boiler and radiators, oil Oil
3 3 Room heaters, electric Electric
4 4 Room heaters, LPG LPG
5 5 Air source heat pump, underfloor heating, electric Wood
6 6 Air source heat pump, fan coil units, electric Mains gas
7 7 Ground source heat pump, mains gas Oil
8 8 Electric storage heaters Electric
9 9 Room heaters, wood logs LPG
10 10 Boilers and radiators, wood chips Wood
与@tmfmnk 类似的逻辑,但在 base R 中使用 sub
。
types <- c('Mains Gas', 'Oil', 'Electric', 'LPG', 'Wood')
data$MainFuelType <- sub(paste0(".*(?i)(", paste0("\b", types, "\b",
collapse = "|"), ").*"), "\1", data$MainHeatDesc)
data
# BuildingID MainHeatDesc MainFuelType
#1 1 Boiler and radiators, mains gas mains gas
#2 2 Boiler and radiators, oil oil
#3 3 Room heaters, electric electric
#4 4 Room heaters, LPG LPG
#5 5 Air source heat pump, underfloor heating, electric electric
#6 6 Air source heat pump, fan coil units, electric electric
#7 7 Ground source heat pump, mains gas mains gas
#8 8 Electric storage heaters Electric
#9 9 Room heaters, wood logs wood
#10 10 Boilers and radiators, wood chips wood
动态生成的正则表达式如下所示:
paste0(".*(?i)(", paste0("\b", types, "\b", collapse = "|"), ").*")
#[1] ".*(?i)(\bMains Gas\b|\bOil\b|\bElectric\b|\bLPG\b|\bWood\b).*"
其中 (?i)
不区分大小写。
另一种方法是使用嵌套的 ifelse
语句和 grepl
,它匹配正则表达式模式:
data$MainFuelType <- ifelse(grepl("mains gas", data$MainHeatDesc), "Mains gas",
ifelse(grepl("\boil", data$MainHeatDesc), "Oil",
ifelse(grepl("(e|E)lectric", data$MainHeatDesc), "Electric",
ifelse(grepl("LPG", data$MainHeatDesc), "LPG", "Wood"))))
结果:
data
BuildingID MainHeatDesc MainFuelType
1 1 Boiler and radiators, mains gas Mains gas
2 2 Boiler and radiators, oil Oil
3 3 Room heaters, electric Electric
4 4 Room heaters, LPG LPG
5 5 Air source heat pump, underfloor heating, electric Electric
6 6 Air source heat pump, fan coil units, electric Electric
7 7 Ground source heat pump, mains gas Mains gas
8 8 Electric storage heaters Electric
9 9 Room heaters, wood logs Wood
10 10 Boilers and radiators, wood chips Wood
我正在使用能源绩效证书数据来识别一个区域中建筑物的取暖燃料类型,但是,它们分为 9 种主要燃料类型的 60 多个不同子集。我想为燃料类型添加另一列,以便它们可以按 9 种主要燃料类型分组。
数据相关列的示例是:
BuildingID <- c(1,2,3,4,5,6,7,8,9,10)
MainHeatDesc <- c("Boiler and radiators, mains gas", "Boiler and radiators, oil", "Room heaters, electric", "Room heaters, LPG", "Air source heat pump, underfloor heating, electric", "Air source heat pump, fan coil units, electric", "Ground source heat pump, mains gas", "Electric storage heaters", "Room heaters, wood logs", "Boilers and radiators, wood chips")
data <- data.frame(BuildingID, MainHeatDesc)
这是一个采用原始数据的某些子集的小示例。在这个例子中,我想为主要燃料类型创建另一个列,将它们分组为:Mains gas、Oil、Electric、LPG 和 wood。
最终结果应该是这样的:
# BuildingID MainHeatDesc MainFuelType
# 1 Boiler and radiators, mains gas Mains gas
# 2 Boiler and radiators, oil Oil
# 3 Room heaters, electric Electric
# 4 Room heaters, LPG LPG
# 5 Air source heat pump, underfloor heating, electric Electric
# 6 Air source heat pump, fan coil units, electric Electric
# 7 Ground source heat pump, mains gas Mains Gas
# 8 Electric storage heaters Electric
# 9 Room heaters, wood logs Wood
# 10 Boilers and radiators, wood chips Wood
如果有人能帮助我,我将不胜感激。如果您有任何疑问或需要更多信息,请告诉我。
谢谢!
一个dplyr
和stringr
选项可以是:
data %>%
mutate(group = str_extract(MainHeatDesc, regex("\bMains gas|\bOil|\bElectric|\bLPG|\bwood", ignore_case = TRUE)))
BuildingID MainHeatDesc group
1 1 Boiler and radiators, mains gas mains gas
2 2 Boiler and radiators, oil oil
3 3 Room heaters, electric electric
4 4 Room heaters, LPG LPG
5 5 Air source heat pump, underfloor heating, electric electric
6 6 Air source heat pump, fan coil units, electric electric
7 7 Ground source heat pump, mains gas mains gas
8 8 Electric storage heaters Electric
9 9 Room heaters, wood logs wood
10 10 Boilers and radiators, wood chips wood
如果你有很多花样,那么你可以这样准备:
x <- paste(paste0("\b", c("Mains gas", "Oil", "Electric", "LPG", "wood"), "\b"), collapse = "|")
data %>%
mutate(group = str_extract(MainHeatDesc, regex(x, ignore_case = TRUE)))
如果你想进一步匹配你的预期输出,那么你可以使用替换向量:
y <- c("Mains gas", "Oil", "Electric", "LPG", "Wood")
data %>%
mutate(group = str_extract(MainHeatDesc, regex(x, ignore_case = TRUE)),
group = str_replace(group, regex(x, ignore_case = TRUE), y))
BuildingID MainHeatDesc group
1 1 Boiler and radiators, mains gas Mains gas
2 2 Boiler and radiators, oil Oil
3 3 Room heaters, electric Electric
4 4 Room heaters, LPG LPG
5 5 Air source heat pump, underfloor heating, electric Wood
6 6 Air source heat pump, fan coil units, electric Mains gas
7 7 Ground source heat pump, mains gas Oil
8 8 Electric storage heaters Electric
9 9 Room heaters, wood logs LPG
10 10 Boilers and radiators, wood chips Wood
与@tmfmnk 类似的逻辑,但在 base R 中使用 sub
。
types <- c('Mains Gas', 'Oil', 'Electric', 'LPG', 'Wood')
data$MainFuelType <- sub(paste0(".*(?i)(", paste0("\b", types, "\b",
collapse = "|"), ").*"), "\1", data$MainHeatDesc)
data
# BuildingID MainHeatDesc MainFuelType
#1 1 Boiler and radiators, mains gas mains gas
#2 2 Boiler and radiators, oil oil
#3 3 Room heaters, electric electric
#4 4 Room heaters, LPG LPG
#5 5 Air source heat pump, underfloor heating, electric electric
#6 6 Air source heat pump, fan coil units, electric electric
#7 7 Ground source heat pump, mains gas mains gas
#8 8 Electric storage heaters Electric
#9 9 Room heaters, wood logs wood
#10 10 Boilers and radiators, wood chips wood
动态生成的正则表达式如下所示:
paste0(".*(?i)(", paste0("\b", types, "\b", collapse = "|"), ").*")
#[1] ".*(?i)(\bMains Gas\b|\bOil\b|\bElectric\b|\bLPG\b|\bWood\b).*"
其中 (?i)
不区分大小写。
另一种方法是使用嵌套的 ifelse
语句和 grepl
,它匹配正则表达式模式:
data$MainFuelType <- ifelse(grepl("mains gas", data$MainHeatDesc), "Mains gas",
ifelse(grepl("\boil", data$MainHeatDesc), "Oil",
ifelse(grepl("(e|E)lectric", data$MainHeatDesc), "Electric",
ifelse(grepl("LPG", data$MainHeatDesc), "LPG", "Wood"))))
结果:
data
BuildingID MainHeatDesc MainFuelType
1 1 Boiler and radiators, mains gas Mains gas
2 2 Boiler and radiators, oil Oil
3 3 Room heaters, electric Electric
4 4 Room heaters, LPG LPG
5 5 Air source heat pump, underfloor heating, electric Electric
6 6 Air source heat pump, fan coil units, electric Electric
7 7 Ground source heat pump, mains gas Mains gas
8 8 Electric storage heaters Electric
9 9 Room heaters, wood logs Wood
10 10 Boilers and radiators, wood chips Wood