如何在不干扰 R 中字符串中的其他值的情况下删除列中的特殊字符

how to remove special characters in a column without disturbing other values in the string in R

你好我的数据集df看起来像

pateint_id  NAME   A            country
1001        kam  0..8               IND
1002        kam  0..8               IND
1003        kam  1.2.               IND
1004        sat  5.4 ( 6.30 PM )    IND
1005        sat  0.6 {2.00 AM}      IND
1006        sat  1-0                IND
1007        bas  76 MMOL            IND
1008        bas  2.3 (Re-Checked)   IND
1009        bas  72 MMOL \L         IND
1010        bas  <0.3               IND

我希望输出为

pateint_id  NAME    A   country
1001         kam    0.8  IND
1002         kam    0.8  IND
1003         kam    1.2  IND
1004         sat    5.4  IND
1005         sat    0.6  IND
1006         sat    1    IND
1007         bas    76   IND
1008         bas    2.3  IND
1009         bas    72   IND
1010         bas    0.3  IND

我尝试使用特定列的 gsub,但结果为 NA

df$A <- as.numeric(as.character(gsub('[a-zA-Z]', "", df$A)))

提前致谢......

如果您将前 2 个值中的两个点替换为一个点,您可以使用 readr 中的 parse_number 直接获取数字格式的数据。

readr::parse_number(sub('\.{1,}','.', df$A))
#[1]  0.8  0.8  1.2  5.4  0.6  1.0 76.0  2.3 72.0  0.3

或使用str_extract

as.numeric(stringr::str_extract(sub('\.{1,}','.', df$A), '\d+\.?\d?'))

数据

df <- structure(list(pateint_id = 1001:1010, NAME = c("kam", "kam", 
"kam", "sat", "sat", "sat", "bas", "bas", "bas", "bas"), A = c("0..8", 
"0..8", "1.2.", "5.4 ( 6.30 PM )", "0.6 {2.00 AM}", "1-0", "76 MMOL", 
"2.3 (Re-Checked)", "72 MMOL \L", "<0.3"), country = c("IND", 
"IND", "IND", "IND", "IND", "IND", "IND", "IND", "IND", "IND"
)), class = "data.frame", row.names = c(NA, -10L))