如何用数据集中其他地方的等效值替换 NA?
How to replace an NA with an equivalent value from elsewhere in a dataset?
我试图寻找类似的问题,但找不到。如果你这样做,请告诉我!
我一直在从事一个研究谷物主食的项目
这是我的数据集的一个子集:
nutrient.component. grain nutrients
1 Beta-carotene (μg) White Rice 0.00
2 Beta-carotene (μg) Brown Rice NA
3 Calcium (mg) White Rice 28.00
4 Calcium (mg) Brown Rice 23.00
5 Carbohydrates (g) White Rice 80.00
6 Carbohydrates (g) Brown Rice 77.00
7 Copper (mg) White Rice 0.22
8 Copper (mg) Brown Rice NA
9 Energy (kJ) White Rice 1528.00
10 Energy (kJ) Brown Rice 1549.00
11 Fat (g) White Rice 0.66
12 Fat (g) Brown Rice 2.92
13 Fiber (g) White Rice 1.30
14 Fiber (g) Brown Rice 3.50
15 Folate Total (B9) (μg) White Rice 8.00
16 Folate Total (B9) (μg) Brown Rice 20.00
17 Iron (mg) White Rice 0.80
18 Iron (mg) Brown Rice 1.47
19 Lutein+zeaxanthin (μg) White Rice 0.00
20 Lutein+zeaxanthin (μg) Brown Rice NA
21 Magnesium (mg) White Rice 25.00
22 Magnesium (mg) Brown Rice 143.00
23 Manganese (mg) White Rice 1.09
24 Manganese (mg) Brown Rice 3.74
25 Monounsaturated fatty acids (g) White Rice 0.21
26 Monounsaturated fatty acids (g) Brown Rice 1.05
27 Niacin (B3) (mg) White Rice 1.60
28 Niacin (B3) (mg) Brown Rice 5.09
29 Pantothenic acid (B5) (mg) White Rice 1.01
30 Pantothenic acid (B5) (mg) Brown Rice 1.49
31 Phosphorus (mg) White Rice 115.00
32 Phosphorus (mg) Brown Rice 333.00
33 Polyunsaturated fatty acids (g) White Rice 0.18
34 Polyunsaturated fatty acids (g) Brown Rice 1.04
35 Potassium (mg) White Rice 115.00
36 Potassium (mg) Brown Rice 223.00
37 Protein (g) White Rice 7.10
38 Protein (g) Brown Rice 7.90
39 Riboflavin (B2)(mg) White Rice 0.05
40 Riboflavin (B2)(mg) Brown Rice 0.09
41 Saturated fatty acids (g) White Rice 0.18
42 Saturated fatty acids (g) Brown Rice 0.58
43 Selenium (μg) White Rice 15.10
44 Selenium (μg) Brown Rice NA
45 Sodium (mg) White Rice 5.00
46 Sodium (mg) Brown Rice 7.00
47 Sugar (g) White Rice 0.12
48 Sugar (g) Brown Rice 0.85
49 Thiamin (B1)(mg) White Rice 0.07
50 Thiamin (B1)(mg) Brown Rice 0.40
51 Vitamin A (IU) White Rice 0.00
52 Vitamin A (IU) Brown Rice 0.00
53 Vitamin B6 (mg) White Rice 0.16
54 Vitamin B6 (mg) Brown Rice 0.51
55 Vitamin C (mg) White Rice 0.00
56 Vitamin C (mg) Brown Rice 0.00
57 Vitamin E, alpha-tocopherol (mg) White Rice 0.11
58 Vitamin E, alpha-tocopherol (mg) Brown Rice 0.59
59 Vitamin K1 (μg) White Rice 0.10
60 Vitamin K1 (μg) Brown Rice 1.90
61 Water (g) White Rice 12.00
62 Water (g) Brown Rice 10.00
63 Zinc (mg) White Rice 1.09
64 Zinc (mg) Brown Rice 2.02
糙米有四个 NA 值。
基于这张图,
我认为可以公平地假设糙米的 NA 值将非常接近白米的等效值。并且反映白米值而不是将值转换为零会更准确。
我的问题是,除了手动查找和输入糙米的白米当量营养素外,代码如何将 NA 替换为白米的等效值?我希望结果能转换为铜的 NA 值;糙米与铜的价值相同;白米饭(0.22)。先用零替换 NA 会更好吗?但是,如果我这样做,那么我有六种营养素的值为零,而不是四个具有 NA 的值。我试图找出通过代码解决这个问题的正确心态。任何对此的见解将不胜感激。
谢谢
我假设你的数据集是 class data.frame
并且它被命名为 dat
.
我相信下面的代码可以做到。它将 df 分成 2 行或 1 行的列表(示例中的最后一行缺少糙米)。然后它检查这些列表是否有 2 行,以及糙米的营养成分是否为 NA
。如果是这样,它会分配白米饭的价值。然后,将结果列表收集回 data.frame
.
sp <- split(dat, dat$nutrient.component.)
res <- lapply(sp, function(x){
if(nrow(x) == 2 & is.na(x$nutrients[x$grain == "Brown Rice"]))
x$grain[x$grain == "Brown Rice"] <- "White Rice"
x
}
)
rm(sp) # tidy up
res <- do.call(rbind, res)
res
zoo
包有一些有用的函数可以处理NA
:
library(data.table)
setDT(DT)[, nutrients := zoo::na.aggregate(nutrients), by = nutrient.component][]
nutrient.component grain nutrients
1: Beta-carotene (<U+00B5>g) White Rice 0.00
2: Beta-carotene (<U+00B5>g) Brown Rice 0.00
3: Calcium (mg) White Rice 28.00
4: Calcium (mg) Brown Rice 23.00
5: Carbohydrates (g) White Rice 80.00
6: Carbohydrates (g) Brown Rice 77.00
7: Copper (mg) White Rice 0.22
8: Copper (mg) Brown Rice 0.22
9: Energy (kJ) White Rice 1528.00
10: Energy (kJ) Brown Rice 1549.00
11: Fat (g) White Rice 0.66
12: Fat (g) Brown Rice 2.92
13: Fiber (g) White Rice 1.30
14: Fiber (g) Brown Rice 3.50
15: Folate Total (B9) (<U+00B5>g) White Rice 8.00
16: Folate Total (B9) (<U+00B5>g) Brown Rice 20.00
17: Iron (mg) White Rice 0.80
18: Iron (mg) Brown Rice 1.47
19: Lutein+zeaxanthin (<U+00B5>g) White Rice 0.00
20: Lutein+zeaxanthin (<U+00B5>g) Brown Rice 0.00
...
记下第 2、8 和 20 行。
data.table
在这里使用是因为它更新 DT
到位 避免复制整个 table 以节省内存和时间。
假设你的输入数据的数据框叫做dt
,我们可以使用tidyr
包中的fill
函数来完成这个任务。 dt2
是最终输出。
library(tidyr)
dt2 <- dt %>% fill(nutrients)
dt2
nutrient.component. grain nutrients
1 1 Beta-carotene (µg) White Rice 0.00
2 2 Beta-carotene (µg) Brown Rice 0.00
3 3 Calcium (mg) White Rice 28.00
4 4 Calcium (mg) Brown Rice 23.00
5 5 Carbohydrates (g) White Rice 80.00
6 6 Carbohydrates (g) Brown Rice 77.00
7 7 Copper (mg) White Rice 0.22
8 8 Copper (mg) Brown Rice 0.22
...
fill
的默认值将根据前一个和最近的非 NA 行估算 NA
。所以重要的是要确保每个糙米记录恰好是相关白米记录的下一行。
我试图寻找类似的问题,但找不到。如果你这样做,请告诉我!
我一直在从事一个研究谷物主食的项目
这是我的数据集的一个子集:
nutrient.component. grain nutrients
1 Beta-carotene (μg) White Rice 0.00
2 Beta-carotene (μg) Brown Rice NA
3 Calcium (mg) White Rice 28.00
4 Calcium (mg) Brown Rice 23.00
5 Carbohydrates (g) White Rice 80.00
6 Carbohydrates (g) Brown Rice 77.00
7 Copper (mg) White Rice 0.22
8 Copper (mg) Brown Rice NA
9 Energy (kJ) White Rice 1528.00
10 Energy (kJ) Brown Rice 1549.00
11 Fat (g) White Rice 0.66
12 Fat (g) Brown Rice 2.92
13 Fiber (g) White Rice 1.30
14 Fiber (g) Brown Rice 3.50
15 Folate Total (B9) (μg) White Rice 8.00
16 Folate Total (B9) (μg) Brown Rice 20.00
17 Iron (mg) White Rice 0.80
18 Iron (mg) Brown Rice 1.47
19 Lutein+zeaxanthin (μg) White Rice 0.00
20 Lutein+zeaxanthin (μg) Brown Rice NA
21 Magnesium (mg) White Rice 25.00
22 Magnesium (mg) Brown Rice 143.00
23 Manganese (mg) White Rice 1.09
24 Manganese (mg) Brown Rice 3.74
25 Monounsaturated fatty acids (g) White Rice 0.21
26 Monounsaturated fatty acids (g) Brown Rice 1.05
27 Niacin (B3) (mg) White Rice 1.60
28 Niacin (B3) (mg) Brown Rice 5.09
29 Pantothenic acid (B5) (mg) White Rice 1.01
30 Pantothenic acid (B5) (mg) Brown Rice 1.49
31 Phosphorus (mg) White Rice 115.00
32 Phosphorus (mg) Brown Rice 333.00
33 Polyunsaturated fatty acids (g) White Rice 0.18
34 Polyunsaturated fatty acids (g) Brown Rice 1.04
35 Potassium (mg) White Rice 115.00
36 Potassium (mg) Brown Rice 223.00
37 Protein (g) White Rice 7.10
38 Protein (g) Brown Rice 7.90
39 Riboflavin (B2)(mg) White Rice 0.05
40 Riboflavin (B2)(mg) Brown Rice 0.09
41 Saturated fatty acids (g) White Rice 0.18
42 Saturated fatty acids (g) Brown Rice 0.58
43 Selenium (μg) White Rice 15.10
44 Selenium (μg) Brown Rice NA
45 Sodium (mg) White Rice 5.00
46 Sodium (mg) Brown Rice 7.00
47 Sugar (g) White Rice 0.12
48 Sugar (g) Brown Rice 0.85
49 Thiamin (B1)(mg) White Rice 0.07
50 Thiamin (B1)(mg) Brown Rice 0.40
51 Vitamin A (IU) White Rice 0.00
52 Vitamin A (IU) Brown Rice 0.00
53 Vitamin B6 (mg) White Rice 0.16
54 Vitamin B6 (mg) Brown Rice 0.51
55 Vitamin C (mg) White Rice 0.00
56 Vitamin C (mg) Brown Rice 0.00
57 Vitamin E, alpha-tocopherol (mg) White Rice 0.11
58 Vitamin E, alpha-tocopherol (mg) Brown Rice 0.59
59 Vitamin K1 (μg) White Rice 0.10
60 Vitamin K1 (μg) Brown Rice 1.90
61 Water (g) White Rice 12.00
62 Water (g) Brown Rice 10.00
63 Zinc (mg) White Rice 1.09
64 Zinc (mg) Brown Rice 2.02
糙米有四个 NA 值。
基于这张图,
我的问题是,除了手动查找和输入糙米的白米当量营养素外,代码如何将 NA 替换为白米的等效值?我希望结果能转换为铜的 NA 值;糙米与铜的价值相同;白米饭(0.22)。先用零替换 NA 会更好吗?但是,如果我这样做,那么我有六种营养素的值为零,而不是四个具有 NA 的值。我试图找出通过代码解决这个问题的正确心态。任何对此的见解将不胜感激。
谢谢
我假设你的数据集是 class data.frame
并且它被命名为 dat
.
我相信下面的代码可以做到。它将 df 分成 2 行或 1 行的列表(示例中的最后一行缺少糙米)。然后它检查这些列表是否有 2 行,以及糙米的营养成分是否为 NA
。如果是这样,它会分配白米饭的价值。然后,将结果列表收集回 data.frame
.
sp <- split(dat, dat$nutrient.component.)
res <- lapply(sp, function(x){
if(nrow(x) == 2 & is.na(x$nutrients[x$grain == "Brown Rice"]))
x$grain[x$grain == "Brown Rice"] <- "White Rice"
x
}
)
rm(sp) # tidy up
res <- do.call(rbind, res)
res
zoo
包有一些有用的函数可以处理NA
:
library(data.table)
setDT(DT)[, nutrients := zoo::na.aggregate(nutrients), by = nutrient.component][]
nutrient.component grain nutrients 1: Beta-carotene (<U+00B5>g) White Rice 0.00 2: Beta-carotene (<U+00B5>g) Brown Rice 0.00 3: Calcium (mg) White Rice 28.00 4: Calcium (mg) Brown Rice 23.00 5: Carbohydrates (g) White Rice 80.00 6: Carbohydrates (g) Brown Rice 77.00 7: Copper (mg) White Rice 0.22 8: Copper (mg) Brown Rice 0.22 9: Energy (kJ) White Rice 1528.00 10: Energy (kJ) Brown Rice 1549.00 11: Fat (g) White Rice 0.66 12: Fat (g) Brown Rice 2.92 13: Fiber (g) White Rice 1.30 14: Fiber (g) Brown Rice 3.50 15: Folate Total (B9) (<U+00B5>g) White Rice 8.00 16: Folate Total (B9) (<U+00B5>g) Brown Rice 20.00 17: Iron (mg) White Rice 0.80 18: Iron (mg) Brown Rice 1.47 19: Lutein+zeaxanthin (<U+00B5>g) White Rice 0.00 20: Lutein+zeaxanthin (<U+00B5>g) Brown Rice 0.00 ...
记下第 2、8 和 20 行。
data.table
在这里使用是因为它更新 DT
到位 避免复制整个 table 以节省内存和时间。
假设你的输入数据的数据框叫做dt
,我们可以使用tidyr
包中的fill
函数来完成这个任务。 dt2
是最终输出。
library(tidyr)
dt2 <- dt %>% fill(nutrients)
dt2
nutrient.component. grain nutrients
1 1 Beta-carotene (µg) White Rice 0.00
2 2 Beta-carotene (µg) Brown Rice 0.00
3 3 Calcium (mg) White Rice 28.00
4 4 Calcium (mg) Brown Rice 23.00
5 5 Carbohydrates (g) White Rice 80.00
6 6 Carbohydrates (g) Brown Rice 77.00
7 7 Copper (mg) White Rice 0.22
8 8 Copper (mg) Brown Rice 0.22
...
fill
的默认值将根据前一个和最近的非 NA 行估算 NA
。所以重要的是要确保每个糙米记录恰好是相关白米记录的下一行。