根据另一列更改列值,但仅适用于第一列和第二列 (R) 中的某些条件
Change column value based on another column, but only for certain conditions in the first AND second column (R)
我有一个数据框。
city <- as.character(c("London", "Unknown", "Birmingham", "Bristol", "Unknown", "Unknown", "Unknown", "Unknown"))
city_details <- as.character(c("London", "Camden", "Birmingham", "Outside London", "Camden Town", "Westminster", "London", "Birmingham"))
city_data <- data.frame(city, city_details)
虽然城市列中的一些值是未知的,但查看 city_details 会发现其中大部分实际上都在伦敦。
所以,我可以替换其中的一些:
city_data$city[grepl("Camden|Westminster", city_data$city_details)] <- 'London'
不过,city_details里面写着"London"的比较难,因为还有"Outside London",所以我不想只拿起任何有 "London" 模式的东西。
出于此目的,我并不是在寻找一种只包含完全匹配的方法(因为这不太适合我的真实数据)。
所以我想做的只是对未知的城市值执行此替换。
目前我已经尝试了以下操作,但显然逻辑不对,因为它实际上所做的只是将城市列中的所有未知值更改为伦敦。
city_data <- within(city_data, city[city == "Unknown"] <- (city[grepl("London", city_details)] <- 'London'))
有人能帮忙吗?
我假设您只想在 city
未知且 city_details
提到 "London" 时替换城市名称。在这种情况下,您可以使用以下内容:
city_data$city[(as.numeric(grepl("Unknown", city)) + as.numeric(grepl("London", city_details))) == 2] <- "London"
这是否回答了您的问题?
我建议如下:
one_hot <- grepl("Camden|Westminster|London", city_data$city_details) &
city_data$city == "Unknown"
city_data$city[one_hot] <- "London"
示例:
city <- as.character(c("London", "Unknown", "Birmingham", "Bristol", "Unknown", "Unknown", "Unknown", "Unknown"))
city_details <- as.character(c("London", "Camden", "Birmingham", "Outside London", "Camden Town", "Westminster", "London", "Tottenham"))
city_data <- data.frame(city, city_details)
> city_data
city city_details
1 London London
2 Unknown Camden
3 Birmingham Birmingham
4 Bristol Outside London
5 Unknown Camden Town
6 Unknown Westminster
7 Unknown London
8 Unknown Tottenham
> one_hot <- grepl("Camden|Westminster|London", city_data$city_details) &
+ city_data$city == "Unknown"
> city_data$city[one_hot] <- "London"
> city_data
city city_details
1 London London
2 London Camden
3 Birmingham Birmingham
4 Bristol Outside London
5 London Camden Town
6 London Westminster
7 London London
8 Unknown Tottenham
我还想出了下面的方法,对我来说似乎更简洁、更直观。无需转换为数字。
city_data$city[grepl("Unknown", city_data$city) &
grepl("London|Camden|Westminster", city_data$city_details)] <- "London"
我有一个数据框。
city <- as.character(c("London", "Unknown", "Birmingham", "Bristol", "Unknown", "Unknown", "Unknown", "Unknown"))
city_details <- as.character(c("London", "Camden", "Birmingham", "Outside London", "Camden Town", "Westminster", "London", "Birmingham"))
city_data <- data.frame(city, city_details)
虽然城市列中的一些值是未知的,但查看 city_details 会发现其中大部分实际上都在伦敦。
所以,我可以替换其中的一些:
city_data$city[grepl("Camden|Westminster", city_data$city_details)] <- 'London'
不过,city_details里面写着"London"的比较难,因为还有"Outside London",所以我不想只拿起任何有 "London" 模式的东西。
出于此目的,我并不是在寻找一种只包含完全匹配的方法(因为这不太适合我的真实数据)。
所以我想做的只是对未知的城市值执行此替换。
目前我已经尝试了以下操作,但显然逻辑不对,因为它实际上所做的只是将城市列中的所有未知值更改为伦敦。
city_data <- within(city_data, city[city == "Unknown"] <- (city[grepl("London", city_details)] <- 'London'))
有人能帮忙吗?
我假设您只想在 city
未知且 city_details
提到 "London" 时替换城市名称。在这种情况下,您可以使用以下内容:
city_data$city[(as.numeric(grepl("Unknown", city)) + as.numeric(grepl("London", city_details))) == 2] <- "London"
这是否回答了您的问题?
我建议如下:
one_hot <- grepl("Camden|Westminster|London", city_data$city_details) &
city_data$city == "Unknown"
city_data$city[one_hot] <- "London"
示例:
city <- as.character(c("London", "Unknown", "Birmingham", "Bristol", "Unknown", "Unknown", "Unknown", "Unknown"))
city_details <- as.character(c("London", "Camden", "Birmingham", "Outside London", "Camden Town", "Westminster", "London", "Tottenham"))
city_data <- data.frame(city, city_details)
> city_data
city city_details
1 London London
2 Unknown Camden
3 Birmingham Birmingham
4 Bristol Outside London
5 Unknown Camden Town
6 Unknown Westminster
7 Unknown London
8 Unknown Tottenham
> one_hot <- grepl("Camden|Westminster|London", city_data$city_details) &
+ city_data$city == "Unknown"
> city_data$city[one_hot] <- "London"
> city_data
city city_details
1 London London
2 London Camden
3 Birmingham Birmingham
4 Bristol Outside London
5 London Camden Town
6 London Westminster
7 London London
8 Unknown Tottenham
我还想出了下面的方法,对我来说似乎更简洁、更直观。无需转换为数字。
city_data$city[grepl("Unknown", city_data$city) &
grepl("London|Camden|Westminster", city_data$city_details)] <- "London"