在 R 中将列切换为字符格式时保留有效零

Keep significant zeros when switching column to character formatting in R

我正在清理 R 中的数据,并希望在将我的列从数字列切换为字符列时保持数字格式,特别是百分之一位置的有效零(在下面的示例中)。我的输入列主要以因子数据开始,下面是我正在尝试做的一个例子。

我相信有更好的方法,只是希望一些比我知识渊博的人能给出一些启示。大多数在线问题都涉及前导零或格式化纯数字列,但我的数据中“<”符号的方面让我陷入循环,不知道正确的做法。

df      = as.factor(c("0.01","5.231","<0.02","0.30","0.801","2.302"))
ind     = which(df %in% "<0.02")       # Locate the below detection value.
df[ind] <- NA                          # Substitute NA temporarily 
df      = as.numeric(as.character(df)) # Changes to numeric column
df      = round(df, digits = 2)        # Rounds to hundredths place
ind1    = which(df < 0.02)             # Check for below reporting limit values
df      = as.character(df)             # Change back to character column...
df[c(ind,ind1)] = "<0.02"              # so I can place the reporting limit back

> # RESULTS::
> df
[1] "<0.02" "5.23"  "<0.02" "0.3"   "0.8"   "2.3"

但是,数据中的第4、5、6个值不再报百位零。正确的操作顺序是什么?也许将列改回字符是不正确的?如有任何建议,我们将不胜感激。

谢谢。

编辑:---- 根据 hrbrmstr 和 Mike 的建议: 谢谢你的建议。我尝试了以下方法,它们都导致了同样的问题。也许还有另一种方法可以 indexing/replacing values?

格式,同样的问题:

#... code from above...
ind1    = which(df < 0.02)
df      = as.character(df)
df[!c(ind,ind1)] = format(df[!c(ind,ind1)],digits=2,nsmall=2)
> df
[1] "<0.02" "5.23"  "<0.02" "0.3 "  "0.8 "  "2.3 " 

sprintf,同样的问题:

# ... above code from example ...
ind1 = which(df < 0.02)   # Check for below reporting limit values.
sprintf("%.2f",df)        # sprintf attempt.
[1] "0.01" "5.23" "NA"   "0.30" "0.80" "2.30"
df[c(ind,ind1)] = "<0.02" # Feed the symbols back into the column.
> df
[1] "<0.02" "5.23"  "<0.02" "0.3"   "0.8"   "2.3"  #Same Problem.

尝试了另一种替换值的方法,但还是遇到了同样的问题。

# ... above code from example ...
> ind1    = which(df < 0.02)
> df[c(ind,ind1)] = 9999999
> sprintf("%.2f",df)
[1] "9999999.00" "5.23"       "9999999.00" "0.30"       "0.80"       "2.30" 
> gsub("9999999.00","<0.02",df)
[1] "<0.02" "5.23"  "<0.02" "0.3"   "0.8"   "2.3"  #Same Problem.

你可以用 gsub 和一些正则表达式填充它...

df <- c("<0.02", "5.23", "<0.02", "0.3", "4",  "0.8",   "2.3")

gsub("^([^\.]+)$", "\1\.00", gsub("\.(\d)$", "\.\10", df))

[1] "<0.02" "5.23"  "<0.02" "0.30"  "4.00"  "0.80"  "2.30" 

第一个 gsub 查找一个点后跟一个数字和一个字符串结尾,并将数字(捕获组 \1)替换为自身后跟一个零。第二个检查没有点的数字,并在末尾添加 .00