如何在不干扰 R 中字符串中的其他值的情况下删除列中的特殊字符
how to remove special characters in a column without disturbing other values in the string in R
你好我的数据集df
看起来像
pateint_id NAME A country
1001 kam 0..8 IND
1002 kam 0..8 IND
1003 kam 1.2. IND
1004 sat 5.4 ( 6.30 PM ) IND
1005 sat 0.6 {2.00 AM} IND
1006 sat 1-0 IND
1007 bas 76 MMOL IND
1008 bas 2.3 (Re-Checked) IND
1009 bas 72 MMOL \L IND
1010 bas <0.3 IND
我希望输出为
pateint_id NAME A country
1001 kam 0.8 IND
1002 kam 0.8 IND
1003 kam 1.2 IND
1004 sat 5.4 IND
1005 sat 0.6 IND
1006 sat 1 IND
1007 bas 76 IND
1008 bas 2.3 IND
1009 bas 72 IND
1010 bas 0.3 IND
我尝试使用特定列的 gsub,但结果为 NA
df$A <- as.numeric(as.character(gsub('[a-zA-Z]', "", df$A)))
提前致谢......
如果您将前 2 个值中的两个点替换为一个点,您可以使用 readr
中的 parse_number
直接获取数字格式的数据。
readr::parse_number(sub('\.{1,}','.', df$A))
#[1] 0.8 0.8 1.2 5.4 0.6 1.0 76.0 2.3 72.0 0.3
或使用str_extract
:
as.numeric(stringr::str_extract(sub('\.{1,}','.', df$A), '\d+\.?\d?'))
数据
df <- structure(list(pateint_id = 1001:1010, NAME = c("kam", "kam",
"kam", "sat", "sat", "sat", "bas", "bas", "bas", "bas"), A = c("0..8",
"0..8", "1.2.", "5.4 ( 6.30 PM )", "0.6 {2.00 AM}", "1-0", "76 MMOL",
"2.3 (Re-Checked)", "72 MMOL \L", "<0.3"), country = c("IND",
"IND", "IND", "IND", "IND", "IND", "IND", "IND", "IND", "IND"
)), class = "data.frame", row.names = c(NA, -10L))
你好我的数据集df
看起来像
pateint_id NAME A country
1001 kam 0..8 IND
1002 kam 0..8 IND
1003 kam 1.2. IND
1004 sat 5.4 ( 6.30 PM ) IND
1005 sat 0.6 {2.00 AM} IND
1006 sat 1-0 IND
1007 bas 76 MMOL IND
1008 bas 2.3 (Re-Checked) IND
1009 bas 72 MMOL \L IND
1010 bas <0.3 IND
我希望输出为
pateint_id NAME A country
1001 kam 0.8 IND
1002 kam 0.8 IND
1003 kam 1.2 IND
1004 sat 5.4 IND
1005 sat 0.6 IND
1006 sat 1 IND
1007 bas 76 IND
1008 bas 2.3 IND
1009 bas 72 IND
1010 bas 0.3 IND
我尝试使用特定列的 gsub,但结果为 NA
df$A <- as.numeric(as.character(gsub('[a-zA-Z]', "", df$A)))
提前致谢......
如果您将前 2 个值中的两个点替换为一个点,您可以使用 readr
中的 parse_number
直接获取数字格式的数据。
readr::parse_number(sub('\.{1,}','.', df$A))
#[1] 0.8 0.8 1.2 5.4 0.6 1.0 76.0 2.3 72.0 0.3
或使用str_extract
:
as.numeric(stringr::str_extract(sub('\.{1,}','.', df$A), '\d+\.?\d?'))
数据
df <- structure(list(pateint_id = 1001:1010, NAME = c("kam", "kam",
"kam", "sat", "sat", "sat", "bas", "bas", "bas", "bas"), A = c("0..8",
"0..8", "1.2.", "5.4 ( 6.30 PM )", "0.6 {2.00 AM}", "1-0", "76 MMOL",
"2.3 (Re-Checked)", "72 MMOL \L", "<0.3"), country = c("IND",
"IND", "IND", "IND", "IND", "IND", "IND", "IND", "IND", "IND"
)), class = "data.frame", row.names = c(NA, -10L))