通过根据组条件创建重复项来扩展 data.frame (2)
Expand data.frame by creating duplicates based on group condition (2)
从@AndrewGustar answer/code开始:
1)
如果我的输入 data.frame 和 ID
值不按顺序并且也可以自我复制怎么办?
示例data.frame:
df = read.table(text = 'ID Day Count Count_group
18 1933 6 11
33 1933 6 11
37 1933 6 11
18 1933 6 11
16 1933 6 11
11 1933 6 11
111 1932 5 8
34 1932 5 8
60 1932 5 8
88 1932 5 8
18 1932 5 8
33 1931 3 4
13 1931 3 4
56 1931 3 4
23 1930 1 1
6 1800 6 10
37 1800 6 10
98 1800 6 10
52 1800 6 10
18 1800 6 10
76 1800 6 10
55 1799 4 6
6 1799 4 6
52 1799 4 6
133 1799 4 6
112 1798 2 2
677 1798 2 2
778 888 4 6
111 888 4 6
88 888 4 6
10 888 4 6
37 887 2 3
26 887 2 3
8 886 1 2
56 885 1 1', header = TRUE)
Count
列显示每个 Day
的 ID
个值的总数,Count_group
列显示每个 ID
个值的总和每个 Day
和 Day - 1
.
例如1933 = Count_group
11 因为 Count
6 (1933) + Count
5 (1932),依此类推。
我需要做的是为每个 Count_group
创建重复的观察并将它们添加到其中,以便每个 Count_group
显示其 Day
和 Day - 1
.
例如Count_group
= 11 由 Day
1933 和 1932 的 Count
值组成。所以这两天都需要包含在 Count_group
= 11 中。下一个是 Count_group
= 8,由1932年和1931年等组成...
期望的输出:
ID Day Count Count_group
18 1933 6 11
33 1933 6 11
37 1933 6 11
18 1933 6 11
16 1933 6 11
11 1933 6 11
111 1932 5 11
34 1932 5 11
60 1932 5 11
88 1932 5 11
18 1932 5 11
111 1932 5 8
34 1932 5 8
60 1932 5 8
88 1932 5 8
18 1932 5 8
33 1931 3 8
13 1931 3 8
56 1931 3 8
33 1931 3 4
13 1931 3 4
56 1931 3 4
23 1930 1 4
23 1930 1 1
6 1800 6 10
37 1800 6 10
98 1800 6 10
52 1800 6 10
18 1800 6 10
76 1800 6 10
55 1799 4 10
6 1799 4 10
52 1799 4 10
133 1799 4 10
55 1799 4 6
6 1799 4 6
52 1799 4 6
133 1799 4 6
112 1798 2 6
677 1798 2 6
112 1798 2 2
677 1798 2 2
778 888 4 6
111 888 4 6
88 888 4 6
10 888 4 6
37 887 2 6
26 887 2 6
37 887 2 3
26 887 2 3
8 886 1 3
8 886 1 2
56 885 1 2
56 885 1 1
这是一个保持上述 ID 值的解决方案。
#first add grouping variables
df$smalldaygroup <- c(0,cumsum(sapply(2:nrow(df),function(i) df$Day[i]!=df$Day[i-1]))) #individual days
df$bigdaygroup <- c(0,cumsum(sapply(2:nrow(df),function(i) df$Day[i]<df$Day[i-1]-1))) #blocks of consecutive days
#duplicate individual days except the first in each big group
df2 <- lapply(split(df,df$bigdaygroup),function(x)
split(x,x$smalldaygroup)[c(1,rep(2:length(split(x,x$smalldaygroup)),each=2))])
#change the Count_group to previous value in alternate entries
df2 <- lapply(df2,function(L) lapply(1:length(L),function(i) {
x <- L[[i]]
if(!(i%%2)) x$Count_group <- L[[i-1]]$Count_group[1]
return(x)
}))
df2 <- do.call(rbind,unlist(df2,recursive=FALSE)) #bind back together
head(df2,20) #ignore rownames!
ID Day Count Count_group
01.1 18 1933 6 11
01.2 33 1933 6 11
01.3 37 1933 6 11
01.4 18 1933 6 11
01.5 16 1933 6 11
01.6 11 1933 6 11
02.7 111 1932 5 11
02.8 34 1932 5 11
02.9 60 1932 5 11
02.10 88 1932 5 11
02.11 18 1932 5 11
03.7 111 1932 5 8
03.8 34 1932 5 8
03.9 60 1932 5 8
03.10 88 1932 5 8
03.11 18 1932 5 8
04.12 33 1931 3 8
04.13 13 1931 3 8
04.14 56 1931 3 8
05.12 33 1931 3 4
从@AndrewGustar answer/code开始:
1)
如果我的输入 data.frame 和 ID
值不按顺序并且也可以自我复制怎么办?
示例data.frame:
df = read.table(text = 'ID Day Count Count_group
18 1933 6 11
33 1933 6 11
37 1933 6 11
18 1933 6 11
16 1933 6 11
11 1933 6 11
111 1932 5 8
34 1932 5 8
60 1932 5 8
88 1932 5 8
18 1932 5 8
33 1931 3 4
13 1931 3 4
56 1931 3 4
23 1930 1 1
6 1800 6 10
37 1800 6 10
98 1800 6 10
52 1800 6 10
18 1800 6 10
76 1800 6 10
55 1799 4 6
6 1799 4 6
52 1799 4 6
133 1799 4 6
112 1798 2 2
677 1798 2 2
778 888 4 6
111 888 4 6
88 888 4 6
10 888 4 6
37 887 2 3
26 887 2 3
8 886 1 2
56 885 1 1', header = TRUE)
Count
列显示每个 Day
的 ID
个值的总数,Count_group
列显示每个 ID
个值的总和每个 Day
和 Day - 1
.
例如1933 = Count_group
11 因为 Count
6 (1933) + Count
5 (1932),依此类推。
我需要做的是为每个 Count_group
创建重复的观察并将它们添加到其中,以便每个 Count_group
显示其 Day
和 Day - 1
.
例如Count_group
= 11 由 Day
1933 和 1932 的 Count
值组成。所以这两天都需要包含在 Count_group
= 11 中。下一个是 Count_group
= 8,由1932年和1931年等组成...
期望的输出:
ID Day Count Count_group
18 1933 6 11
33 1933 6 11
37 1933 6 11
18 1933 6 11
16 1933 6 11
11 1933 6 11
111 1932 5 11
34 1932 5 11
60 1932 5 11
88 1932 5 11
18 1932 5 11
111 1932 5 8
34 1932 5 8
60 1932 5 8
88 1932 5 8
18 1932 5 8
33 1931 3 8
13 1931 3 8
56 1931 3 8
33 1931 3 4
13 1931 3 4
56 1931 3 4
23 1930 1 4
23 1930 1 1
6 1800 6 10
37 1800 6 10
98 1800 6 10
52 1800 6 10
18 1800 6 10
76 1800 6 10
55 1799 4 10
6 1799 4 10
52 1799 4 10
133 1799 4 10
55 1799 4 6
6 1799 4 6
52 1799 4 6
133 1799 4 6
112 1798 2 6
677 1798 2 6
112 1798 2 2
677 1798 2 2
778 888 4 6
111 888 4 6
88 888 4 6
10 888 4 6
37 887 2 6
26 887 2 6
37 887 2 3
26 887 2 3
8 886 1 3
8 886 1 2
56 885 1 2
56 885 1 1
这是一个保持上述 ID 值的解决方案。
#first add grouping variables
df$smalldaygroup <- c(0,cumsum(sapply(2:nrow(df),function(i) df$Day[i]!=df$Day[i-1]))) #individual days
df$bigdaygroup <- c(0,cumsum(sapply(2:nrow(df),function(i) df$Day[i]<df$Day[i-1]-1))) #blocks of consecutive days
#duplicate individual days except the first in each big group
df2 <- lapply(split(df,df$bigdaygroup),function(x)
split(x,x$smalldaygroup)[c(1,rep(2:length(split(x,x$smalldaygroup)),each=2))])
#change the Count_group to previous value in alternate entries
df2 <- lapply(df2,function(L) lapply(1:length(L),function(i) {
x <- L[[i]]
if(!(i%%2)) x$Count_group <- L[[i-1]]$Count_group[1]
return(x)
}))
df2 <- do.call(rbind,unlist(df2,recursive=FALSE)) #bind back together
head(df2,20) #ignore rownames!
ID Day Count Count_group
01.1 18 1933 6 11
01.2 33 1933 6 11
01.3 37 1933 6 11
01.4 18 1933 6 11
01.5 16 1933 6 11
01.6 11 1933 6 11
02.7 111 1932 5 11
02.8 34 1932 5 11
02.9 60 1932 5 11
02.10 88 1932 5 11
02.11 18 1932 5 11
03.7 111 1932 5 8
03.8 34 1932 5 8
03.9 60 1932 5 8
03.10 88 1932 5 8
03.11 18 1932 5 8
04.12 33 1931 3 8
04.13 13 1931 3 8
04.14 56 1931 3 8
05.12 33 1931 3 4