R:将数据帧的结构转换为另一个数据帧的相同结构

R: Converting structure of a dataframe into the same structure of another dataframe

目前我有 2 个大型数据框,其中包含超过 300,000 个观察值和 100 多个变量,但为了简单起见,我们假设我有 df1:

> str(df1)
'data.frame':   3000 obs. of  3 variables:
 $ Name         : chr  "AAA" "BBB" "CCC" "DDD" ...
 $ DateTime     : POSIXct, format: "2014-01-01 00:00:00" "2014-01-01 00:10:00" "2014-01-01 00:20:00" ...
 $ Age          : num  27 25 27 30 ...

df2:

> str(df2)
'data.frame':   3000 obs. of  3 variables:
 $ HEX          : Factor w/ 500 levels "AAA","BBB",..: 100 100 100 100 ...
 $ DateTime     : Factor w/ 3000 levels "2014-01-01 00:00:00",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ Age          : Factor w/ 500 levels "27","25",..: 100 100 100 100 ...

两个数据帧具有相同的值和相同的列数和行数,除了它们的结构与 df2 中的所有因素不同。

我想将 df2 中的结构转换为与 df1 相同的结构。请指教,提前谢谢

假设两个数据框的列与描述的顺序完全相同,您可以在 Map 方法中使用 class 函数。

df2[] <- Map(function(x, y) {
  if (any(grepl("POS", y)))
    ISOdate(as.Date(x), 0, 0, 0)
  else if (y == "Date")
    as.Date(x)
  else
    `class<-`(as.character(x), y)
  }, df2, lapply(df1, class))

示范[​​=27=]

之前

lapply(df1, class)
# $name
# [1] "character"
# 
# $date
# [1] "POSIXct" "POSIXt" 
# 
# $age
# [1] "numeric"
# 
# $date2
# [1] "Date"

lapply(df2, class)
# $HEX
# [1] "factor"
# 
# $date
# [1] "factor"
# 
# $age
# [1] "factor"
# 
# $date2
# [1] "factor"

转化

df2[] <- Map(function(x, y) {
  if (any(grepl("POS", y)))
    ISOdate(as.Date(x), 0, 0, 0)
  else if (y == "Date")
    as.Date(x)
  else
    `class<-`(as.character(x), y)
  }, df2, lapply(df1, class))

之后

lapply(df2, class)
# $HEX
# [1] "character"
# 
# $date
# [1] "POSIXct" "POSIXt" 
# 
# $age
# [1] "numeric"
# 
# $date2
# [1] "Date"

数据

df1 <- structure(list(name = c("A", "B", "C", "D", "E"), date = structure(c(1577836800, 
1580515200, 1583020800, 1585699200, 1588291200), class = c("POSIXct", 
"POSIXt")), age = c(30, 27, 25, 28, 23), date2 = structure(c(18262, 
18293, 18322, 18353, 18383), class = "Date")), row.names = c(NA, 
-5L), class = "data.frame")

df2 <- structure(list(HEX = structure(1:5, .Label = c("A", "B", "C", 
"D", "E"), class = "factor"), date = structure(1:5, .Label = c("2020-01-01 01:00:00", 
"2020-02-01 01:00:00", "2020-03-01 01:00:00", "2020-04-01 02:00:00", 
"2020-05-01 02:00:00"), class = "factor"), age = structure(c(5L, 
3L, 2L, 4L, 1L), .Label = c("23", "25", "27", "28", "30"), class = "factor"), 
    date2 = structure(1:5, .Label = c("2020-01-01", "2020-02-01", 
    "2020-03-01", "2020-04-01", "2020-05-01"), class = "factor")), row.names = c(NA, 
-5L), class = "data.frame")