如何将数字替换为R中的随机值?

How to replace number to random value in R?

我想将数据框的数字更改为随机值(数字)。

x = c("010-1234-5678",
          "John 010-8888-8888",
          "Phone: 010-1111-2222",
          "Peter 018.1111.3333",
          "Year(2007,2019,2020)",
          "Alice 01077776666")
    
    df = data.frame(
      phoneNumber = x
    )

例如, John 836-3816-9361 是我想要的输出(随机数),我想更改与 PhonePeter 等相关的其他数字。

我只是输入random <- sample(1:9,1),但我不知道下一步。

您可以尝试这种基本的 R 方法。我们将提取 phone 个数字,用 11 个随机数字屏蔽它们,然后将它们放回原来的位置。请注意,如果此行为是您想要的,此方法不会更改 "Year(2007,2019,2020)" 中的任何数字。

rand_mask <- function(x) {
  m1 <- regexpr("\b\d{3}([-.]?)\d{4}\1\d{4}\b", x)
  phones <- regmatches(x, m1)
  m2 <- gregexpr("\d", phones)
  rand <- replicate(length(m2), sample.int(10L, 11L, replace = TRUE) - 1L, simplify = FALSE)
  regmatches(phones, m2) <- rand
  regmatches(x, m1) <- phones
  x
}

测试

> set.seed(1234L)
> rand_mask(x)
[1] "954-8453-1659"        "John 537-3347-3723"   "Phone: 941-7326-8253" "Peter 791.4505.7250"  "Year(2007,2019,2020)"
[6] "Alice 08790795282"   
> rand_mask(x)
[1] "589-6578-2214"        "John 796-5386-2547"   "Phone: 036-2830-5595" "Peter 440.1591.4329"  "Year(2007,2019,2020)"
[6] "Alice 34955828570"   
> rand_mask(x)
[1] "815-7526-4562"        "John 268-7295-6144"   "Phone: 827-2728-3732" "Peter 960.8214.9580"  "Year(2007,2019,2020)"
[6] "Alice 28834025451"  

更新

这个用随机数替换所有数字。

rand_mask2 <- function(x) {
  m <- gregexpr("\d", x)
  regmatches(x, m) <- lapply(lengths(m), \(n) sample.int(10L, n, replace = TRUE) - 1L)
  x
}

测试

> set.seed(1234L)
> rand_mask2(x)
[1] "954-8453-1659"        "John 537-3347-3723"   "Phone: 941-7326-8253" "Peter 791.4505.7250"  "Year(0879,0795,2825)"
[6] "Alice 89657822147"   
> rand_mask2(x)
[1] "965-3862-5470"        "John 362-8305-5954"   "Phone: 401-5914-3293" "Peter 495.5828.5708"  "Year(1575,2645,6226)"
[6] "Alice 87295614482"   
> rand_mask2(x)
[1] "727-2837-3296"        "John 082-1495-8028"   "Phone: 834-0254-5121" "Peter 656.7306.9729"  "Year(1707,7594,3930)"
[6] "Alice 55233202082"  

您可以使用 stringr::str_replace_all(),它可以将函数应用于正则表达式匹配。

library(dplyr)
library(stringr)

set.seed(5)

df %>%
  mutate(res = str_replace_all(x, "\d+", \(x) str_pad(sample(10 ^ (nc <- nchar(x)), 1) - 1, nc, pad = "0")))

           phoneNumber                  res
1        010-1234-5678        888-6858-2254
2   John 010-8888-8888   John 221-3796-1832
3 Phone: 010-1111-2222 Phone: 402-1526-7238
4  Peter 018.1111.3333  Peter 825.3877.4482
5 Year(2007,2019,2020) Year(3599,9035,9012)
6    Alice 01077776666    Alice 90000945625

如果要替换所有数字,gsubfn 包会很有用。

library(gsubfn)
gsubfn("[0-9]", \(x) sample(0:9, 1), df$phoneNumber)

这会将所有数字替换为随机数。当然,如果数据是 tidy,这种方法效果更好,例如有人可能会反对将姓名、年份和 phone 数字混合在一个名为 phoneNumber.[=14 的列中=]