基于多列重塑数据集
reshape dataset based on multiple colums
我之前对数据进行了整形,但是单元格总是可以通过两个变量来识别。这对我当前的数据来说是不可能的。我的数据摘录如下所示。完整数据集涵盖更多国家和年份。
国家
对犯罪的恐惧
总计
2007
2009
2010
阿根廷
全部或几乎所有时间
37
37
33
27
阿根廷
有时
34
42
35
40
阿根廷
偶尔
18
14
23
23
阿根廷
从不
11
6
8
10
阿根廷
不要know/No回答
0
1
1
0
玻利维亚
全部或几乎所有时间
38
35
36
34
玻利维亚
有时
36
40
41
40
玻利维亚
偶尔
17
17
18
18
玻利维亚
从不
8
6
4
6
玻利维亚
不要know/No回答
1
1
0
1
我需要这种格式的数据:
年
国家
全部或几乎所有时间
有时
偶尔
从不
不要know/No回答
有人有解决办法吗?非常感谢!
library(dplyr)
library(tidyr)
dat %>%
pivot_longer(
cols = -c(Country, `Fear of Crime`),
names_to = "Year"
) %>%
pivot_wider(
id_cols = c(Year, Country),
names_from = `Fear of Crime`,
values_from = value
)
# A tibble: 6 x 7
# Year Country All Sometimes Occasionally Never `Don't know`
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 2007 Argentina 52.0 29.7 52.1 34.2 59.9
#2 2009 Argentina 52.8 38.1 42.0 73.5 42.9
#3 2010 Argentina 56.2 64.6 31.0 71.6 32.1
#4 2007 Bolivia 36.8 37.4 31.4 45.0 56.3
#5 2009 Bolivia 53.2 52.8 62.8 56.1 59.9
#6 2010 Bolivia 42.4 45.1 67.4 55.0 58.1
数据:
dat <- tibble(
Country = rep(c("Argentina", "Bolivia"), each = 5),
`Fear of Crime` = rep(c("All", "Sometimes", "Occasionally", "Never", "Don't know"), 2),
`2007` = rnorm(10, 50, 10),
`2009` = rnorm(10, 50, 10),
`2010` = rnorm(10, 50, 10)
)
您也可以使用以下解决方案。我将 TOTAL
值添加到 Year
变量,并对 Fear_of_Crime
进行了轻微更改,以便所有值都是 title_case:
library(tidyr)
library(stringr)
df %>%
pivot_longer(TOTAL:X2010, names_to = "Year", names_prefix = "X?") %>%
mutate(Fear_of_Crime = str_to_title(Fear_of_Crime)) %>%
pivot_wider(names_from = Fear_of_Crime, values_from = value)
# A tibble: 8 x 7
Country Year All_or_almost_the_time Sometimes Occasionally Never `Don´T_know/No_answer`
<chr> <chr> <int> <int> <int> <int> <int>
1 Argentina TOTAL 37 34 18 11 0
2 Argentina 2007 37 42 14 6 1
3 Argentina 2009 33 35 23 8 1
4 Argentina 2010 27 40 23 10 0
5 Bolivia TOTAL 38 36 17 8 1
6 Bolivia 2007 35 40 17 6 1
7 Bolivia 2009 36 41 18 4 0
8 Bolivia 2010 34 40 18 6 1
使用data.table
-
library(data.table)
dcast(melt(setDT(df), id.vars = c('Country', 'Fear of Crime')),
Country + variable ~ `Fear of Crime` , value.var = 'value')
# Country variable All Don't know Never Occasionally Sometimes
#1: Argentina 2007 61.35123 51.64059 52.90937 56.06212 51.27404
#2: Argentina 2009 49.97756 48.41825 67.63133 35.55390 46.89938
#3: Argentina 2010 54.35360 57.17569 43.49386 54.01240 59.79714
#4: Argentina Total 77.11749 66.08187 57.24466 82.39351 89.21991
#5: Bolivia 2007 53.28061 49.66029 39.45862 54.87632 36.10037
#6: Bolivia 2009 32.22393 44.43537 56.89622 58.62973 40.09476
#7: Bolivia 2010 43.81035 43.07929 39.85770 57.56582 49.35075
#8: Bolivia Total 97.55278 57.06825 72.95792 87.52021 59.40992
我之前对数据进行了整形,但是单元格总是可以通过两个变量来识别。这对我当前的数据来说是不可能的。我的数据摘录如下所示。完整数据集涵盖更多国家和年份。
国家 | 对犯罪的恐惧 | 总计 | 2007 | 2009 | 2010 |
---|---|---|---|---|---|
阿根廷 | 全部或几乎所有时间 | 37 | 37 | 33 | 27 |
阿根廷 | 有时 | 34 | 42 | 35 | 40 |
阿根廷 | 偶尔 | 18 | 14 | 23 | 23 |
阿根廷 | 从不 | 11 | 6 | 8 | 10 |
阿根廷 | 不要know/No回答 | 0 | 1 | 1 | 0 |
玻利维亚 | 全部或几乎所有时间 | 38 | 35 | 36 | 34 |
玻利维亚 | 有时 | 36 | 40 | 41 | 40 |
玻利维亚 | 偶尔 | 17 | 17 | 18 | 18 |
玻利维亚 | 从不 | 8 | 6 | 4 | 6 |
玻利维亚 | 不要know/No回答 | 1 | 1 | 0 | 1 |
我需要这种格式的数据:
年 | 国家 | 全部或几乎所有时间 | 有时 | 偶尔 | 从不 | 不要know/No回答 | |
---|---|---|---|---|---|---|---|
有人有解决办法吗?非常感谢!
library(dplyr)
library(tidyr)
dat %>%
pivot_longer(
cols = -c(Country, `Fear of Crime`),
names_to = "Year"
) %>%
pivot_wider(
id_cols = c(Year, Country),
names_from = `Fear of Crime`,
values_from = value
)
# A tibble: 6 x 7
# Year Country All Sometimes Occasionally Never `Don't know`
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 2007 Argentina 52.0 29.7 52.1 34.2 59.9
#2 2009 Argentina 52.8 38.1 42.0 73.5 42.9
#3 2010 Argentina 56.2 64.6 31.0 71.6 32.1
#4 2007 Bolivia 36.8 37.4 31.4 45.0 56.3
#5 2009 Bolivia 53.2 52.8 62.8 56.1 59.9
#6 2010 Bolivia 42.4 45.1 67.4 55.0 58.1
数据:
dat <- tibble(
Country = rep(c("Argentina", "Bolivia"), each = 5),
`Fear of Crime` = rep(c("All", "Sometimes", "Occasionally", "Never", "Don't know"), 2),
`2007` = rnorm(10, 50, 10),
`2009` = rnorm(10, 50, 10),
`2010` = rnorm(10, 50, 10)
)
您也可以使用以下解决方案。我将 TOTAL
值添加到 Year
变量,并对 Fear_of_Crime
进行了轻微更改,以便所有值都是 title_case:
library(tidyr)
library(stringr)
df %>%
pivot_longer(TOTAL:X2010, names_to = "Year", names_prefix = "X?") %>%
mutate(Fear_of_Crime = str_to_title(Fear_of_Crime)) %>%
pivot_wider(names_from = Fear_of_Crime, values_from = value)
# A tibble: 8 x 7
Country Year All_or_almost_the_time Sometimes Occasionally Never `Don´T_know/No_answer`
<chr> <chr> <int> <int> <int> <int> <int>
1 Argentina TOTAL 37 34 18 11 0
2 Argentina 2007 37 42 14 6 1
3 Argentina 2009 33 35 23 8 1
4 Argentina 2010 27 40 23 10 0
5 Bolivia TOTAL 38 36 17 8 1
6 Bolivia 2007 35 40 17 6 1
7 Bolivia 2009 36 41 18 4 0
8 Bolivia 2010 34 40 18 6 1
使用data.table
-
library(data.table)
dcast(melt(setDT(df), id.vars = c('Country', 'Fear of Crime')),
Country + variable ~ `Fear of Crime` , value.var = 'value')
# Country variable All Don't know Never Occasionally Sometimes
#1: Argentina 2007 61.35123 51.64059 52.90937 56.06212 51.27404
#2: Argentina 2009 49.97756 48.41825 67.63133 35.55390 46.89938
#3: Argentina 2010 54.35360 57.17569 43.49386 54.01240 59.79714
#4: Argentina Total 77.11749 66.08187 57.24466 82.39351 89.21991
#5: Bolivia 2007 53.28061 49.66029 39.45862 54.87632 36.10037
#6: Bolivia 2009 32.22393 44.43537 56.89622 58.62973 40.09476
#7: Bolivia 2010 43.81035 43.07929 39.85770 57.56582 49.35075
#8: Bolivia Total 97.55278 57.06825 72.95792 87.52021 59.40992