如何使用翻译table量化一列中的字符值?
How to quantify character values in one column by using a translation table?
我有一个数据文件,我想量化从字符串/类别到数字的列。我有一个预制文件,其中包含大约 500 个不同的类别以及它需要变成的相应编号。
所以我的第一个文件看起来应该是这样的:
Type_of_fruit
Banana
Apple
Apple
Kiwi
Passionfruit
Banana
Apple
Orange
Etc.
然后我有第二个 table 看起来像这样(翻译 table):
Banana | 1
Apple | 2
Kiwi | 3
Passionfruit | 4
Orange | 5
Mango | 6
Grape | 7
Etc.
并希望使用此翻译 table 在我的原始数据框中创建一个新的量化列:
Type_of_fruit_quantified
1
2
2
3
4
1
2
5
起初我想用 mutate 命令来做,例如
Mutate(Type_of_fruit_quantified = if_else(Type_of_fruit == “香蕉”, 1, if_else(Type_of_fruit == “苹果”, 2, 等等。
然而,翻译 table 中有大约 500 个不同的类别,这将需要很长时间。我怎样才能更快地做到这一点,例如通过参考翻译 table?
重新创建我的模拟数据:
Type_of_fruit <- c("Banana", "Apple", "Apple", "Kiwi", "Passionfruit", "Banana", "Apple", "Orange")
Type_of_fruit_df <- data.frame(Type_of_fruit)
Fruit <- c("Banana", "Apple", "Kiwi", "Passionfruit", "Orange", "Mango", "Grape")
Number <- c(1, 2, 3, 4, 5, 6, 7)
Translation_table <- data.frame(Fruit, Number)
更改 Type_of_fruit_df
的列名,以便所有表共享 Fruit
的列名,然后使用 ?dplyr::left_join
Type_of_fruit <- c("Banana", "Apple", "Apple", "Kiwi", "Passionfruit", "Banana", "Apple", "Orange")
Type_of_fruit_df <- data.frame(Fruit = Type_of_fruit)
Fruit <- c("Banana", "Apple", "Kiwi", "Passionfruit", "Orange", "Mango", "Grape")
Number <- c(1, 2, 3, 4, 5, 6, 7)
Translation_table <- data.frame(Fruit, Number)
> left_join(Type_of_fruit_df,Translation_table, by = "Fruit")
Fruit Number
1 Banana 1
2 Apple 2
3 Apple 2
4 Kiwi 3
5 Passionfruit 4
6 Banana 1
7 Apple 2
8 Orange 5
我有一个数据文件,我想量化从字符串/类别到数字的列。我有一个预制文件,其中包含大约 500 个不同的类别以及它需要变成的相应编号。
所以我的第一个文件看起来应该是这样的:
Type_of_fruit
Banana
Apple
Apple
Kiwi
Passionfruit
Banana
Apple
Orange
Etc.
然后我有第二个 table 看起来像这样(翻译 table):
Banana | 1
Apple | 2
Kiwi | 3
Passionfruit | 4
Orange | 5
Mango | 6
Grape | 7
Etc.
并希望使用此翻译 table 在我的原始数据框中创建一个新的量化列:
Type_of_fruit_quantified
1
2
2
3
4
1
2
5
起初我想用 mutate 命令来做,例如 Mutate(Type_of_fruit_quantified = if_else(Type_of_fruit == “香蕉”, 1, if_else(Type_of_fruit == “苹果”, 2, 等等。 然而,翻译 table 中有大约 500 个不同的类别,这将需要很长时间。我怎样才能更快地做到这一点,例如通过参考翻译 table?
重新创建我的模拟数据:
Type_of_fruit <- c("Banana", "Apple", "Apple", "Kiwi", "Passionfruit", "Banana", "Apple", "Orange")
Type_of_fruit_df <- data.frame(Type_of_fruit)
Fruit <- c("Banana", "Apple", "Kiwi", "Passionfruit", "Orange", "Mango", "Grape")
Number <- c(1, 2, 3, 4, 5, 6, 7)
Translation_table <- data.frame(Fruit, Number)
更改 Type_of_fruit_df
的列名,以便所有表共享 Fruit
的列名,然后使用 ?dplyr::left_join
Type_of_fruit <- c("Banana", "Apple", "Apple", "Kiwi", "Passionfruit", "Banana", "Apple", "Orange")
Type_of_fruit_df <- data.frame(Fruit = Type_of_fruit)
Fruit <- c("Banana", "Apple", "Kiwi", "Passionfruit", "Orange", "Mango", "Grape")
Number <- c(1, 2, 3, 4, 5, 6, 7)
Translation_table <- data.frame(Fruit, Number)
> left_join(Type_of_fruit_df,Translation_table, by = "Fruit")
Fruit Number
1 Banana 1
2 Apple 2
3 Apple 2
4 Kiwi 3
5 Passionfruit 4
6 Banana 1
7 Apple 2
8 Orange 5