长格式和宽格式的数据,需要在 R 中转换为长格式
Data In Long and Wide Format, Need to Convert to Just Long in R
我正在处理一个既有宽格式又有宽格式的数据集。看起来像:
ID week1 week2 week3 ... week12
1 2 NA NA ... NA
1 NA 3 NA ... NA
1 NA NA 3 ... NA
...
1 NA NA NA ... 4
2 4 NA NA ... NA
2 NA 5 NA ... NA
2 NA NA 3 ... NA
我现在正在努力将其转换为单独的长格式以供分析。我希望将其设置为:
ID week value
1 1 2
1 2 3
1 3 3
...
1 12 4
2 1 4
2 2 5
2 3 3
任何人都可以就在 R 中执行此操作提出任何建议吗?我已经尝试过 reshape2 和 dplyr/tidyr,但是当我 select ID 变量时,我总是得到太多的观察结果。
这个怎么样:
library(dplyr)
# small data sample
df <- read.table(text = 'ID week1 week2 week3 week4
1 2 NA NA NA
1 NA 3 NA NA
1 NA NA 3 NA
1 NA NA NA 4
2 4 NA NA NA
2 NA 5 NA NA
2 NA NA 3 NA', header = T)
df %>%
data.table::melt(id.vars = 'ID') %>%
na.omit()
1) gather 使用 wide
在最后的注释 1 中重复显示,使用 gather
将 wide
转换为长格式, 删除 NA 行和排序。
library(dplyr)
library(tidyr)
wide %>%
gather("week", "value", -ID) %>%
drop_na %>%
arrange(ID, week)
给予:
ID week value
1 1 week1 2
2 1 week2 3
3 1 week3 3
4 1 week4 4
5 2 week1 4
6 2 week2 5
7 2 week3 3
2) 重塑 仅使用基数 R:
varying <- list(value = 2:5)
long <- na.omit(reshape(wide, dir = "long", timevar = "week",
varying = varying, v.names = names(varying)))[1:3]
long[order(long$ID, long$week), ]
给予:
ID week value
1.1 1 1 2
2.2 1 2 3
3.3 1 3 3
4.4 1 4 4
5.1 2 1 4
6.2 2 2 5
7.3 2 3 3
3) data.table 使用 (2) 中的 varying
我们可以使用 data.table 中的 melt
。请注意,我们可以指定 id.vars 或 measure.vars 但在评论中指出我们可能希望将其推广到多个变量并且 measure.vars 方法推广。
library(data.table)
longDT <- na.omit(melt(as.data.table(wide), measure.vars = varying,
variable.name = "week"))
setkey(longDT, ID, week)
longDT
给予:
ID week value
1: 1 week1 2
2: 1 week2 3
3: 1 week3 3
4: 1 week4 4
5: 2 week1 4
6: 2 week2 5
7: 2 week3 3
注释 1
以可重现形式使用的输入是:
Lines <- "
ID week1 week2 week3 week4
1 2 NA NA NA
1 NA 3 NA NA
1 NA NA 3 NA
1 NA NA NA 4
2 4 NA NA NA
2 NA 5 NA NA
2 NA NA 3 NA"
wide <- read.table(text = Lines, header = TRUE)
注2
关于具有多个变量 data.table 的 melt
支持这一点。
假设我们有以下内容:
Lines2 <- "
ID week1var1 week1var2 week2var1 week2var2 week3var1 week3var2 week4var1 week4var2
1 1 2 20 NA NA NA NA NA NA
2 1 NA NA 3 30 NA NA NA NA
3 1 NA NA NA NA 3 30 NA NA
4 1 NA NA NA NA NA NA 4 40
5 2 4 40 NA NA NA NA NA NA
6 2 NA NA 5 50 NA NA NA NA
7 2 NA NA NA NA 3 30 NA NA"
wide2 <- read.table(text = Lines, header = TRUE)
library(data.table)
varying2 <- split(names(wide2)[-1],
sub("(.*\d)(\D.*)", "\2", names(wide2)[-1]))
longDT2 <- na.omit(melt(as.data.table(wide2), measure.vars = varying2,
variable.name = "week"))
setkey(longDT2, ID, week)
longDT2
给予:
ID week var1 var2
1: 1 1 2 20
2: 1 2 3 30
3: 1 3 3 30
4: 1 4 4 40
5: 2 1 4 40
6: 2 2 5 50
7: 2 3 3 30
我正在处理一个既有宽格式又有宽格式的数据集。看起来像:
ID week1 week2 week3 ... week12
1 2 NA NA ... NA
1 NA 3 NA ... NA
1 NA NA 3 ... NA
...
1 NA NA NA ... 4
2 4 NA NA ... NA
2 NA 5 NA ... NA
2 NA NA 3 ... NA
我现在正在努力将其转换为单独的长格式以供分析。我希望将其设置为:
ID week value
1 1 2
1 2 3
1 3 3
...
1 12 4
2 1 4
2 2 5
2 3 3
任何人都可以就在 R 中执行此操作提出任何建议吗?我已经尝试过 reshape2 和 dplyr/tidyr,但是当我 select ID 变量时,我总是得到太多的观察结果。
这个怎么样:
library(dplyr)
# small data sample
df <- read.table(text = 'ID week1 week2 week3 week4
1 2 NA NA NA
1 NA 3 NA NA
1 NA NA 3 NA
1 NA NA NA 4
2 4 NA NA NA
2 NA 5 NA NA
2 NA NA 3 NA', header = T)
df %>%
data.table::melt(id.vars = 'ID') %>%
na.omit()
1) gather 使用 wide
在最后的注释 1 中重复显示,使用 gather
将 wide
转换为长格式, 删除 NA 行和排序。
library(dplyr)
library(tidyr)
wide %>%
gather("week", "value", -ID) %>%
drop_na %>%
arrange(ID, week)
给予:
ID week value
1 1 week1 2
2 1 week2 3
3 1 week3 3
4 1 week4 4
5 2 week1 4
6 2 week2 5
7 2 week3 3
2) 重塑 仅使用基数 R:
varying <- list(value = 2:5)
long <- na.omit(reshape(wide, dir = "long", timevar = "week",
varying = varying, v.names = names(varying)))[1:3]
long[order(long$ID, long$week), ]
给予:
ID week value
1.1 1 1 2
2.2 1 2 3
3.3 1 3 3
4.4 1 4 4
5.1 2 1 4
6.2 2 2 5
7.3 2 3 3
3) data.table 使用 (2) 中的 varying
我们可以使用 data.table 中的 melt
。请注意,我们可以指定 id.vars 或 measure.vars 但在评论中指出我们可能希望将其推广到多个变量并且 measure.vars 方法推广。
library(data.table)
longDT <- na.omit(melt(as.data.table(wide), measure.vars = varying,
variable.name = "week"))
setkey(longDT, ID, week)
longDT
给予:
ID week value
1: 1 week1 2
2: 1 week2 3
3: 1 week3 3
4: 1 week4 4
5: 2 week1 4
6: 2 week2 5
7: 2 week3 3
注释 1
以可重现形式使用的输入是:
Lines <- "
ID week1 week2 week3 week4
1 2 NA NA NA
1 NA 3 NA NA
1 NA NA 3 NA
1 NA NA NA 4
2 4 NA NA NA
2 NA 5 NA NA
2 NA NA 3 NA"
wide <- read.table(text = Lines, header = TRUE)
注2
关于具有多个变量 data.table 的 melt
支持这一点。
假设我们有以下内容:
Lines2 <- "
ID week1var1 week1var2 week2var1 week2var2 week3var1 week3var2 week4var1 week4var2
1 1 2 20 NA NA NA NA NA NA
2 1 NA NA 3 30 NA NA NA NA
3 1 NA NA NA NA 3 30 NA NA
4 1 NA NA NA NA NA NA 4 40
5 2 4 40 NA NA NA NA NA NA
6 2 NA NA 5 50 NA NA NA NA
7 2 NA NA NA NA 3 30 NA NA"
wide2 <- read.table(text = Lines, header = TRUE)
library(data.table)
varying2 <- split(names(wide2)[-1],
sub("(.*\d)(\D.*)", "\2", names(wide2)[-1]))
longDT2 <- na.omit(melt(as.data.table(wide2), measure.vars = varying2,
variable.name = "week"))
setkey(longDT2, ID, week)
longDT2
给予:
ID week var1 var2
1: 1 1 2 20
2: 1 2 3 30
3: 1 3 3 30
4: 1 4 4 40
5: 2 1 4 40
6: 2 2 5 50
7: 2 3 3 30