在两点之间拆分字符串列表
Split a list of strings between two points
c("1x Tomatoes 1kg R 16", "1x Oyster Mushroom R 20", "1x Potatoes 1 kg R 15")
我有一个像这样的长列表,我需要在每个字符串中拆分 x
和 R
之间的字符串,这样我在制作数据时可以有相同数量的列frame 和我不能只用空格分割,因为不是这个列表上的每个项目都是两个词的产品,其中一些是 2-4 个词长,所以用空格分割是行不通的。
编辑:
这是我试图过滤掉一些无用的词以便制作数据框的实际文件
1x Tomatoes 1kg for R 16 each
1x Oyster Mushroom for R 20 each
1x Potatoes 1 kg for R 15 each
1x Stirfry 400g for R 20 each
2x Red apples 4 medium for R 10 each
1x beef Fillet Steak 300g for R 54 each
1x Beef Rump Steak 300g for R 45 each
1x Back Bacon 200g for R 30 each
1x Gouda 1kg for R 130 each
1x Chicken flattie lemon and herb for R 85 each
2x Lean Beef Mince for R 54 each
我知道模式列表毫无用处,这是一团糟,但感谢您的帮助。
我现在在想 x
和 R
之间的分裂并不是最好的,因为其他产品中有资本 R
试试这个
chr <- c("1x Tomatoes 1kg R 16", "1x Oyster Mushroom R 20", "1x Potatoes 1 kg R 15")
strsplit(chr, "(?<=x) | (?=R)", perl = TRUE)
如果您需要将结果转换成数据框,那么
as.data.frame(do.call(rbind, strsplit(chr, "(?<=x) | (?=R)", perl = TRUE)))
如果您使用的是 R v 4.0.2,则应区分大小写
更新
这个怎么样?
as.data.frame(do.call(rbind, strsplit(sub(" each$", "", vec), "(?<=\dx) | for ", perl = TRUE)))
输入
> vec
[1] "1x Tomatoes 1kg for R 16 each" "1x Oyster Mushroom for R 20 each" "1x Potatoes 1 kg for R 15 each"
[4] "1x Stirfry 400g for R 20 each" "2x Red apples 4 medium for R 10 each" "1x beef Fillet Steak 300g for R 54 each"
[7] "1x Beef Rump Steak 300g for R 45 each" "1x Back Bacon 200g for R 30 each" "1x Gouda 1kg for R 130 each"
[10] "1x Chicken flattie lemon and herb for R 85 each" "2x Lean Beef Mince for R 54 each"
输出
V1 V2 V3
1 1x Tomatoes 1kg R 16
2 1x Oyster Mushroom R 20
3 1x Potatoes 1 kg R 15
4 1x Stirfry 400g R 20
5 2x Red apples 4 medium R 10
6 1x beef Fillet Steak 300g R 54
7 1x Beef Rump Steak 300g R 45
8 1x Back Bacon 200g R 30
9 1x Gouda 1kg R 130
10 1x Chicken flattie lemon and herb R 85
11 2x Lean Beef Mince R 54
更新:如果您总是有“for R ...”,则将下面的方法更改为
matches <- stringr::str_match(string, "x\s*(.*?)\s*for R")[,2]
您可以像这样从 {stringr} 包中尝试 str_match
:
string <- c("1x Tomatoes 1kg R 16", "1x Oyster Mushroom R 20", "1x Potatoes 1 kg R 15")
matches <- stringr::str_match(string, "x\s*(.*?)\s*R")[,2]
matches
#> [1] "Tomatoes 1kg" "Oyster Mushroom" "Potatoes 1 kg"
由 reprex 包 (v0.3.0) 创建于 2020-10-19
这应该可以工作(全部以 R 为基数):
x <- c("1x Tomatoes 1kg R 16", "1x Oyster Mushroom R 20", "1x Potatoes 1 kg R 15")
x <- t(matrix(unlist(x = strsplit(x = unlist(strsplit(x,
split = ' R ')),
split = 'x ')),
ncol = 3))
x <- as.data.frame(x)
c("1x Tomatoes 1kg R 16", "1x Oyster Mushroom R 20", "1x Potatoes 1 kg R 15")
我有一个像这样的长列表,我需要在每个字符串中拆分 x
和 R
之间的字符串,这样我在制作数据时可以有相同数量的列frame 和我不能只用空格分割,因为不是这个列表上的每个项目都是两个词的产品,其中一些是 2-4 个词长,所以用空格分割是行不通的。
编辑:
这是我试图过滤掉一些无用的词以便制作数据框的实际文件
1x Tomatoes 1kg for R 16 each
1x Oyster Mushroom for R 20 each
1x Potatoes 1 kg for R 15 each
1x Stirfry 400g for R 20 each
2x Red apples 4 medium for R 10 each
1x beef Fillet Steak 300g for R 54 each
1x Beef Rump Steak 300g for R 45 each
1x Back Bacon 200g for R 30 each
1x Gouda 1kg for R 130 each
1x Chicken flattie lemon and herb for R 85 each
2x Lean Beef Mince for R 54 each
我知道模式列表毫无用处,这是一团糟,但感谢您的帮助。
我现在在想 x
和 R
之间的分裂并不是最好的,因为其他产品中有资本 R
试试这个
chr <- c("1x Tomatoes 1kg R 16", "1x Oyster Mushroom R 20", "1x Potatoes 1 kg R 15")
strsplit(chr, "(?<=x) | (?=R)", perl = TRUE)
如果您需要将结果转换成数据框,那么
as.data.frame(do.call(rbind, strsplit(chr, "(?<=x) | (?=R)", perl = TRUE)))
如果您使用的是 R v 4.0.2,则应区分大小写
更新
这个怎么样?
as.data.frame(do.call(rbind, strsplit(sub(" each$", "", vec), "(?<=\dx) | for ", perl = TRUE)))
输入
> vec
[1] "1x Tomatoes 1kg for R 16 each" "1x Oyster Mushroom for R 20 each" "1x Potatoes 1 kg for R 15 each"
[4] "1x Stirfry 400g for R 20 each" "2x Red apples 4 medium for R 10 each" "1x beef Fillet Steak 300g for R 54 each"
[7] "1x Beef Rump Steak 300g for R 45 each" "1x Back Bacon 200g for R 30 each" "1x Gouda 1kg for R 130 each"
[10] "1x Chicken flattie lemon and herb for R 85 each" "2x Lean Beef Mince for R 54 each"
输出
V1 V2 V3
1 1x Tomatoes 1kg R 16
2 1x Oyster Mushroom R 20
3 1x Potatoes 1 kg R 15
4 1x Stirfry 400g R 20
5 2x Red apples 4 medium R 10
6 1x beef Fillet Steak 300g R 54
7 1x Beef Rump Steak 300g R 45
8 1x Back Bacon 200g R 30
9 1x Gouda 1kg R 130
10 1x Chicken flattie lemon and herb R 85
11 2x Lean Beef Mince R 54
更新:如果您总是有“for R ...”,则将下面的方法更改为
matches <- stringr::str_match(string, "x\s*(.*?)\s*for R")[,2]
您可以像这样从 {stringr} 包中尝试 str_match
:
string <- c("1x Tomatoes 1kg R 16", "1x Oyster Mushroom R 20", "1x Potatoes 1 kg R 15")
matches <- stringr::str_match(string, "x\s*(.*?)\s*R")[,2]
matches
#> [1] "Tomatoes 1kg" "Oyster Mushroom" "Potatoes 1 kg"
由 reprex 包 (v0.3.0) 创建于 2020-10-19
这应该可以工作(全部以 R 为基数):
x <- c("1x Tomatoes 1kg R 16", "1x Oyster Mushroom R 20", "1x Potatoes 1 kg R 15")
x <- t(matrix(unlist(x = strsplit(x = unlist(strsplit(x,
split = ' R ')),
split = 'x ')),
ncol = 3))
x <- as.data.frame(x)