根据数据框中的行观察值计算值的百分比份额
Calculate percentage share of values against a value which is a row observation in the data frame
我想计算百分比份额并使用 mutate 创建新列。我有以下数据:
country, metric, segment, value1990, value2000, value2010
canada, abc, rural, 10, 15, 16
canada, abc, urban, 12, 12, 18
canada, abc, total, 22, 27, 34
canada, xyz, rural, 6, 9, 10
canada, xyc, urban, 7, 8, 8
canada, xyc, total, 13, 17, 18
canada, population, rural, 80, 86, 95
canada, population, urban, 102, 110, 121
canada, population, total, 182, 196, 216
数据框包含来自多个国家和跨年的数据。我想创建一个具有以下值的新列
country, metric, segment, value, percent1990, percent2000, percent2010
canada, abc, rural, 10, 15, 16, 12.5%, 17.4%, 16.8%
canada, abc, urban, 12, 12, 18, 11.7%, 10.9%, 14.8%
canada, abc, total, 22, 27, 34, 12.1%, 13.7%, 15.7%
canada, xyz, rural, 6, 9, 10, 7.5%, 10.4%, 10.5%
canada, xyc, urban, 7, 8, 8, 6.8%, 7.2%, 6.6%
canada, xyc, total, 13, 17, 18, 7.22%, 8.6%, 8.3%
canada, population, rural, 80, 86, 95, 100%, 100%, 100%
canada, population, urban, 102, 110, 121, 100%, 100%, 100%
canada, population, total, 182, 196, 216, 100%, 100%, 100%
本质上,我想计算值变量在人口中的百分比份额,具体取决于它是否 rural/urban/total,跨越多年。
例如
(第 1 行)percent_share = (10/80)*100 = 12.5%
(第 2 行)percent_share = (10/102)*100 = 11.76%
(第 3 行)percent_share = (10/182)*100 = 12.09%
我无法超越 group_by
链接来确定如何输入必要的功能
df = df %>%
group_by (country, metric) %>%
mutate(...)
编辑:对于包含年份的新问题数据
如果您将年份和总人口移动到新列,这会更容易。这是一种方法。
假设您的示例数据位于名为 df1
的数据框中:首先是 gather
年。
library(dplyr)
library(tidyr)
df1 <- df1 %>% gather(Year, Value, 4:6)
然后过滤 metric
== population
并返回原始数据。
df1 %>% filter(metric == "population") %>%
left_join(filter(df1, metric != "population"),
by = c("country", "segment", "Year")) %>%
select(country, segment, Year, population = Value.x, metric = metric.y, value = Value.y)
结果:
country segment Year population metric value
1 canada rural value1990 80 abc 10
2 canada rural value1990 80 xyz 6
3 canada urban value1990 102 abc 12
4 canada urban value1990 102 xyc 7
5 canada total value1990 182 abc 22
6 canada total value1990 182 xyc 13
7 canada rural value2000 86 abc 15
8 canada rural value2000 86 xyz 9
9 canada urban value2000 110 abc 12
10 canada urban value2000 110 xyc 8
11 canada total value2000 196 abc 27
12 canada total value2000 196 xyc 17
13 canada rural value2010 95 abc 16
14 canada rural value2010 95 xyz 10
15 canada urban value2010 121 abc 18
16 canada urban value2010 121 xyc 8
17 canada total value2010 216 abc 34
18 canada total value2010 216 xyc 18
然后添加一个变异:
df1 %>% filter(metric == "population") %>%
left_join(filter(df1, metric != "population"),
by = c("country", "segment", "Year")) %>%
select(country, segment, Year, population = Value.x, metric = metric.y, value = Value.y) %>%
mutate(percent_share = 100 * (value / population))
结果:
country segment Year population metric value percent_share
1 canada rural value1990 80 abc 10 12.500000
2 canada rural value1990 80 xyz 6 7.500000
3 canada urban value1990 102 abc 12 11.764706
4 canada urban value1990 102 xyc 7 6.862745
5 canada total value1990 182 abc 22 12.087912
6 canada total value1990 182 xyc 13 7.142857
7 canada rural value2000 86 abc 15 17.441860
8 canada rural value2000 86 xyz 9 10.465116
9 canada urban value2000 110 abc 12 10.909091
10 canada urban value2000 110 xyc 8 7.272727
11 canada total value2000 196 abc 27 13.775510
12 canada total value2000 196 xyc 17 8.673469
13 canada rural value2010 95 abc 16 16.842105
14 canada rural value2010 95 xyz 10 10.526316
15 canada urban value2010 121 abc 18 14.876033
16 canada urban value2010 121 xyc 8 6.611570
17 canada total value2010 216 abc 34 15.740741
18 canada total value2010 216 xyc 18 8.333333
你也可以只按 segment
分组然后除以 max(value
),因为人口值应该是最大的:
df %>%
group_by(country, segment) %>%
mutate(percent_share = value / max(value))
# A tibble: 9 x 5
# Groups: segment [3]
country metric segment value percent_share
<chr> <chr> <chr> <dbl> <dbl>
1 canada abc rural 10 0.125
2 canada abc urban 12 0.118
3 canada abc total 22 0.121
4 canada xyz rural 6 0.075
5 canada xyc urban 7 0.0686
6 canada xyc total 13 0.0714
7 canada population rural 80 1
8 canada population urban 102 1
9 canada population total 182 1
我想计算百分比份额并使用 mutate 创建新列。我有以下数据:
country, metric, segment, value1990, value2000, value2010
canada, abc, rural, 10, 15, 16
canada, abc, urban, 12, 12, 18
canada, abc, total, 22, 27, 34
canada, xyz, rural, 6, 9, 10
canada, xyc, urban, 7, 8, 8
canada, xyc, total, 13, 17, 18
canada, population, rural, 80, 86, 95
canada, population, urban, 102, 110, 121
canada, population, total, 182, 196, 216
数据框包含来自多个国家和跨年的数据。我想创建一个具有以下值的新列
country, metric, segment, value, percent1990, percent2000, percent2010
canada, abc, rural, 10, 15, 16, 12.5%, 17.4%, 16.8%
canada, abc, urban, 12, 12, 18, 11.7%, 10.9%, 14.8%
canada, abc, total, 22, 27, 34, 12.1%, 13.7%, 15.7%
canada, xyz, rural, 6, 9, 10, 7.5%, 10.4%, 10.5%
canada, xyc, urban, 7, 8, 8, 6.8%, 7.2%, 6.6%
canada, xyc, total, 13, 17, 18, 7.22%, 8.6%, 8.3%
canada, population, rural, 80, 86, 95, 100%, 100%, 100%
canada, population, urban, 102, 110, 121, 100%, 100%, 100%
canada, population, total, 182, 196, 216, 100%, 100%, 100%
本质上,我想计算值变量在人口中的百分比份额,具体取决于它是否 rural/urban/total,跨越多年。
例如
(第 1 行)percent_share = (10/80)*100 = 12.5%
(第 2 行)percent_share = (10/102)*100 = 11.76%
(第 3 行)percent_share = (10/182)*100 = 12.09%
我无法超越 group_by
链接来确定如何输入必要的功能
df = df %>%
group_by (country, metric) %>%
mutate(...)
编辑:对于包含年份的新问题数据
如果您将年份和总人口移动到新列,这会更容易。这是一种方法。
假设您的示例数据位于名为 df1
的数据框中:首先是 gather
年。
library(dplyr)
library(tidyr)
df1 <- df1 %>% gather(Year, Value, 4:6)
然后过滤 metric
== population
并返回原始数据。
df1 %>% filter(metric == "population") %>%
left_join(filter(df1, metric != "population"),
by = c("country", "segment", "Year")) %>%
select(country, segment, Year, population = Value.x, metric = metric.y, value = Value.y)
结果:
country segment Year population metric value
1 canada rural value1990 80 abc 10
2 canada rural value1990 80 xyz 6
3 canada urban value1990 102 abc 12
4 canada urban value1990 102 xyc 7
5 canada total value1990 182 abc 22
6 canada total value1990 182 xyc 13
7 canada rural value2000 86 abc 15
8 canada rural value2000 86 xyz 9
9 canada urban value2000 110 abc 12
10 canada urban value2000 110 xyc 8
11 canada total value2000 196 abc 27
12 canada total value2000 196 xyc 17
13 canada rural value2010 95 abc 16
14 canada rural value2010 95 xyz 10
15 canada urban value2010 121 abc 18
16 canada urban value2010 121 xyc 8
17 canada total value2010 216 abc 34
18 canada total value2010 216 xyc 18
然后添加一个变异:
df1 %>% filter(metric == "population") %>%
left_join(filter(df1, metric != "population"),
by = c("country", "segment", "Year")) %>%
select(country, segment, Year, population = Value.x, metric = metric.y, value = Value.y) %>%
mutate(percent_share = 100 * (value / population))
结果:
country segment Year population metric value percent_share
1 canada rural value1990 80 abc 10 12.500000
2 canada rural value1990 80 xyz 6 7.500000
3 canada urban value1990 102 abc 12 11.764706
4 canada urban value1990 102 xyc 7 6.862745
5 canada total value1990 182 abc 22 12.087912
6 canada total value1990 182 xyc 13 7.142857
7 canada rural value2000 86 abc 15 17.441860
8 canada rural value2000 86 xyz 9 10.465116
9 canada urban value2000 110 abc 12 10.909091
10 canada urban value2000 110 xyc 8 7.272727
11 canada total value2000 196 abc 27 13.775510
12 canada total value2000 196 xyc 17 8.673469
13 canada rural value2010 95 abc 16 16.842105
14 canada rural value2010 95 xyz 10 10.526316
15 canada urban value2010 121 abc 18 14.876033
16 canada urban value2010 121 xyc 8 6.611570
17 canada total value2010 216 abc 34 15.740741
18 canada total value2010 216 xyc 18 8.333333
你也可以只按 segment
分组然后除以 max(value
),因为人口值应该是最大的:
df %>%
group_by(country, segment) %>%
mutate(percent_share = value / max(value))
# A tibble: 9 x 5
# Groups: segment [3]
country metric segment value percent_share
<chr> <chr> <chr> <dbl> <dbl>
1 canada abc rural 10 0.125
2 canada abc urban 12 0.118
3 canada abc total 22 0.121
4 canada xyz rural 6 0.075
5 canada xyc urban 7 0.0686
6 canada xyc total 13 0.0714
7 canada population rural 80 1
8 canada population urban 102 1
9 canada population total 182 1