使用重新编码清理数据框列
Use recode to clean data frame column
如何使用 recode()
来 "clean/strip" 数据框中某列的某些部分?原始数据框如下所示:
df <- data.frame(duration = c("concentration, up to 2 minutes", "concentration, up to 4 minutes", "up to 6 hours"), name = c("Earth", "Water", "Fire"))
改进后的版本是这样的:
df <- data.frame(duration = c("2 minutes", "4 minutes", "6 hours"), name = c("Earth", "Water", "Fire"))
所以,我应该删除 "concentration," 和 "up to" 或使用 recode
函数将其替换为空字符串。
请找到 dplyr::recode()
和 strings::str_remove()
的解决方案。
不过我的建议是也学习后者。这样您将能够学习更强大的方法来通过正则表达式转换字符串。
dplyr::recode()
的解决方案
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df <- data.frame(duration = c("concentration, up to 2 minutes",
"concentration, up to 4 minutes",
"up to 6 hours"),
name = c("Earth", "Water", "Fire"))
df$duration = recode(df$duration,
"concentration, up to 2 minutes" = "2 minutes",
"concentration, up to 4 minutes" = "4 minutes",
"up to 6 hours" = "6 hours" )
df
#> duration name
#> 1 2 minutes Earth
#> 2 4 minutes Water
#> 3 6 hours Fire
由 reprex package (v0.3.0)
于 2020-05-04 创建
stringr::str_remove()
的解决方案
library(stringr)
df <- data.frame(duration = c("concentration, up to 2 minutes",
"concentration, up to 4 minutes",
"up to 6 hours"),
name = c("Earth", "Water", "Fire"))
df$duration = str_remove( df$duration, "^.*(?=\d)")
df
#> duration name
#> 1 2 minutes Earth
#> 2 4 minutes Water
#> 3 6 hours Fire
由 reprex package (v0.3.0)
于 2020-05-04 创建
如何使用 recode()
来 "clean/strip" 数据框中某列的某些部分?原始数据框如下所示:
df <- data.frame(duration = c("concentration, up to 2 minutes", "concentration, up to 4 minutes", "up to 6 hours"), name = c("Earth", "Water", "Fire"))
改进后的版本是这样的:
df <- data.frame(duration = c("2 minutes", "4 minutes", "6 hours"), name = c("Earth", "Water", "Fire"))
所以,我应该删除 "concentration," 和 "up to" 或使用 recode
函数将其替换为空字符串。
请找到 dplyr::recode()
和 strings::str_remove()
的解决方案。
不过我的建议是也学习后者。这样您将能够学习更强大的方法来通过正则表达式转换字符串。
dplyr::recode()
的解决方案
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df <- data.frame(duration = c("concentration, up to 2 minutes",
"concentration, up to 4 minutes",
"up to 6 hours"),
name = c("Earth", "Water", "Fire"))
df$duration = recode(df$duration,
"concentration, up to 2 minutes" = "2 minutes",
"concentration, up to 4 minutes" = "4 minutes",
"up to 6 hours" = "6 hours" )
df
#> duration name
#> 1 2 minutes Earth
#> 2 4 minutes Water
#> 3 6 hours Fire
由 reprex package (v0.3.0)
于 2020-05-04 创建stringr::str_remove()
的解决方案
library(stringr)
df <- data.frame(duration = c("concentration, up to 2 minutes",
"concentration, up to 4 minutes",
"up to 6 hours"),
name = c("Earth", "Water", "Fire"))
df$duration = str_remove( df$duration, "^.*(?=\d)")
df
#> duration name
#> 1 2 minutes Earth
#> 2 4 minutes Water
#> 3 6 hours Fire
由 reprex package (v0.3.0)
于 2020-05-04 创建