由 \n 分隔的字符串到数据框
String separated by \n to dataframe
我有以下字符串:
"Title\nToday 1,239 €\nYesterday 1,2 €\n17/04/2018 1,2 €\n14/04/2018 1,2 €\n13/04/2018 1,2 €\n12/04/2018 1,2 €\n11/04/2018 1,2 €\n09/04/2018 1,2 €\n08/04/2018 1,2 €\n07/04/2018 1,2 €"
但我不知道是否可以从中获取数据框。我想得到一个包含两列(日期和价格)的数据框,我的字符串如下(不需要 Title
名称):
Date Price
Today 1,239 €
Yesteday 1,2 €
17/04/2018 1,2 €
14/04/2018 1,2 €
13/04/2018 1,2 €
12/04/2018 1,2 €
11/04/2018 1,2 €
09/04/2018 1,2 €
08/04/2018 1,2 €
07/04/2018 1,2 €
这几乎和我用 cat
函数得到的一样。但我想我可以将它转换为数据框。
有什么想法吗?
我建议做这样的事情来将字符串 s
转换为 data.frame
。这个想法是将日期、值和单位分开,以便更轻松地处理数据,因为您将单位和数字条目分开。
df <- do.call(rbind.data.frame, strsplit(
unlist(strsplit(sub("Title\n", "", s), "\n")),
" "))
colnames(df) <- c("Date", "Value", "Unit");
df$Value <- as.numeric(as.character(sub(",", ".", df$Value)));
# Date Value Unit
#1 Today 1.239 €
#2 Yesterday 1.200 €
#3 17/04/2018 1.200 €
#4 14/04/2018 1.200 €
#5 13/04/2018 1.200 €
#6 12/04/2018 1.200 €
#7 11/04/2018 1.200 €
#8 09/04/2018 1.200 €
#9 08/04/2018 1.200 €
#10 07/04/2018 1.200 €
解释:我们首先在 "\n"
上拆分 s
,然后在空白处拆分出 Date
、Value
和 Unit
。由于您的值包含逗号小数点分隔符“,”,因此我们将“,”替换为“.”并转换为 numeric
.
你可以避免sub("Title\n", "", s)
(感谢@PoGibas),让它稍微紧凑一些,方法是:
df <- do.call(rbind.data.frame, strsplit(unlist(strsplit(s, "\n"))[-1], " "))
colnames(df) <- c("Date", "Value", "Unit");
df$Value <- as.numeric(as.character(sub(",", ".", df$Value)));
输出同上
示例数据
s <- "Title\nToday 1,239 €\nYesterday 1,2 €\n17/04/2018 1,2 €\n14/04/2018 1,2 €\n13/04/2018 1,2 €\n12/04/2018 1,2 €\n11/04/2018 1,2 €\n09/04/2018 1,2 €\n08/04/2018 1,2 €\n07/04/2018 1,2 €"
我已经实施了几次 strsplit
,然后我构建了一个 matrix
,它被转换成一个数据框(通过获取矩阵):
# Making a short object containing your string
x <- "Title\nToday 1,239 €\nYesterday 1,2 €\n17/04/2018 1,2 €\n14/04/2018 1,2 €\n13/04/2018 1,2 €\n12/04/2018 1,2 €\n11/04/2018 1,2 €\n09/04/2018 1,2 €\n08/04/2018 1,2 €\n07/04/2018 1,2 €"
# Two string splits (first splitting by "\n" and then by " "), and discarding the "title" (by taking [[1]][2:11])
x <- unlist(strsplit(strsplit(x, split = "\n")[[1]][2:11], split = " "))
# Putting it in a data frame (dropping the € symbol)
df1 <- data.frame(matrix(x, ncol = 3, byrow = T)[,1:2])
结果:
> df1
X1 X2
1 Today 1,239
2 Yesterday 1,2
3 17/04/2018 1,2
4 14/04/2018 1,2
5 13/04/2018 1,2
6 12/04/2018 1,2
7 11/04/2018 1,2
8 09/04/2018 1,2
9 08/04/2018 1,2
10 07/04/2018 1,2
我还要添加到子“,”到“。”值为 numeric
x <- unlist(strsplit(strsplit(x, split = "\n")[[1]][2:11], split = " "))
x <- gsub(",", ".", x)
df1 <- data.frame(matrix(x, ncol = 3, byrow = T)[,1:2])
df1[,2] <- as.numeric(levels(df1[,2]))[df1[,2]]
这里是 read.table
的解决方案:
> read.table(text=str, sep=' ', skip=1, col.names=c('Date', 'Price', 'Currency'))
Date Price Currency
1 Today 1,239 €
2 Yesterday 1,2 €
3 17/04/2018 1,2 €
4 14/04/2018 1,2 €
5 13/04/2018 1,2 €
6 12/04/2018 1,2 €
7 11/04/2018 1,2 €
8 09/04/2018 1,2 €
9 08/04/2018 1,2 €
10 07/04/2018 1,2 €
其中 str
是您的数据。请注意参数 skip
正在删除 'Title'.
这是 strsplit
和 dplyr::separate
的解决方案。
prices <- "Title\nToday 1,239 €\nYesterday 1,2 €\n17/04/2018 1,2 €\n14/04/2018 1,2 €\n13/04/2018 1,2 €\n12/04/2018 1,2 €\n11/04/2018 1,2 €\n09/04/2018 1,2 €\n08/04/2018 1,2 €\n07/04/2018 1,2 €"
prices <- data.frame(x = strsplit(prices, "\n", "", fixed = TRUE)[[1]])
prices <- prices %>% separate(x, " ", into = c("Date", "Prices"), extra = "merge")
prices <- prices[-1,]
prices
# Date Prices
# 2 Today 1,239 €
# 3 Yesterday 1,2 €
# 4 17/04/2018 1,2 €
# 5 14/04/2018 1,2 €
# 6 13/04/2018 1,2 €
# 7 12/04/2018 1,2 €
# 8 11/04/2018 1,2 €
# 9 09/04/2018 1,2 €
# 10 08/04/2018 1,2 €
# 11 07/04/2018 1,2 €
我有以下字符串:
"Title\nToday 1,239 €\nYesterday 1,2 €\n17/04/2018 1,2 €\n14/04/2018 1,2 €\n13/04/2018 1,2 €\n12/04/2018 1,2 €\n11/04/2018 1,2 €\n09/04/2018 1,2 €\n08/04/2018 1,2 €\n07/04/2018 1,2 €"
但我不知道是否可以从中获取数据框。我想得到一个包含两列(日期和价格)的数据框,我的字符串如下(不需要 Title
名称):
Date Price
Today 1,239 €
Yesteday 1,2 €
17/04/2018 1,2 €
14/04/2018 1,2 €
13/04/2018 1,2 €
12/04/2018 1,2 €
11/04/2018 1,2 €
09/04/2018 1,2 €
08/04/2018 1,2 €
07/04/2018 1,2 €
这几乎和我用 cat
函数得到的一样。但我想我可以将它转换为数据框。
有什么想法吗?
我建议做这样的事情来将字符串 s
转换为 data.frame
。这个想法是将日期、值和单位分开,以便更轻松地处理数据,因为您将单位和数字条目分开。
df <- do.call(rbind.data.frame, strsplit(
unlist(strsplit(sub("Title\n", "", s), "\n")),
" "))
colnames(df) <- c("Date", "Value", "Unit");
df$Value <- as.numeric(as.character(sub(",", ".", df$Value)));
# Date Value Unit
#1 Today 1.239 €
#2 Yesterday 1.200 €
#3 17/04/2018 1.200 €
#4 14/04/2018 1.200 €
#5 13/04/2018 1.200 €
#6 12/04/2018 1.200 €
#7 11/04/2018 1.200 €
#8 09/04/2018 1.200 €
#9 08/04/2018 1.200 €
#10 07/04/2018 1.200 €
解释:我们首先在 "\n"
上拆分 s
,然后在空白处拆分出 Date
、Value
和 Unit
。由于您的值包含逗号小数点分隔符“,”,因此我们将“,”替换为“.”并转换为 numeric
.
你可以避免sub("Title\n", "", s)
(感谢@PoGibas),让它稍微紧凑一些,方法是:
df <- do.call(rbind.data.frame, strsplit(unlist(strsplit(s, "\n"))[-1], " "))
colnames(df) <- c("Date", "Value", "Unit");
df$Value <- as.numeric(as.character(sub(",", ".", df$Value)));
输出同上
示例数据
s <- "Title\nToday 1,239 €\nYesterday 1,2 €\n17/04/2018 1,2 €\n14/04/2018 1,2 €\n13/04/2018 1,2 €\n12/04/2018 1,2 €\n11/04/2018 1,2 €\n09/04/2018 1,2 €\n08/04/2018 1,2 €\n07/04/2018 1,2 €"
我已经实施了几次 strsplit
,然后我构建了一个 matrix
,它被转换成一个数据框(通过获取矩阵):
# Making a short object containing your string
x <- "Title\nToday 1,239 €\nYesterday 1,2 €\n17/04/2018 1,2 €\n14/04/2018 1,2 €\n13/04/2018 1,2 €\n12/04/2018 1,2 €\n11/04/2018 1,2 €\n09/04/2018 1,2 €\n08/04/2018 1,2 €\n07/04/2018 1,2 €"
# Two string splits (first splitting by "\n" and then by " "), and discarding the "title" (by taking [[1]][2:11])
x <- unlist(strsplit(strsplit(x, split = "\n")[[1]][2:11], split = " "))
# Putting it in a data frame (dropping the € symbol)
df1 <- data.frame(matrix(x, ncol = 3, byrow = T)[,1:2])
结果:
> df1
X1 X2
1 Today 1,239
2 Yesterday 1,2
3 17/04/2018 1,2
4 14/04/2018 1,2
5 13/04/2018 1,2
6 12/04/2018 1,2
7 11/04/2018 1,2
8 09/04/2018 1,2
9 08/04/2018 1,2
10 07/04/2018 1,2
我还要添加到子“,”到“。”值为 numeric
x <- unlist(strsplit(strsplit(x, split = "\n")[[1]][2:11], split = " "))
x <- gsub(",", ".", x)
df1 <- data.frame(matrix(x, ncol = 3, byrow = T)[,1:2])
df1[,2] <- as.numeric(levels(df1[,2]))[df1[,2]]
这里是 read.table
的解决方案:
> read.table(text=str, sep=' ', skip=1, col.names=c('Date', 'Price', 'Currency'))
Date Price Currency
1 Today 1,239 €
2 Yesterday 1,2 €
3 17/04/2018 1,2 €
4 14/04/2018 1,2 €
5 13/04/2018 1,2 €
6 12/04/2018 1,2 €
7 11/04/2018 1,2 €
8 09/04/2018 1,2 €
9 08/04/2018 1,2 €
10 07/04/2018 1,2 €
其中 str
是您的数据。请注意参数 skip
正在删除 'Title'.
这是 strsplit
和 dplyr::separate
的解决方案。
prices <- "Title\nToday 1,239 €\nYesterday 1,2 €\n17/04/2018 1,2 €\n14/04/2018 1,2 €\n13/04/2018 1,2 €\n12/04/2018 1,2 €\n11/04/2018 1,2 €\n09/04/2018 1,2 €\n08/04/2018 1,2 €\n07/04/2018 1,2 €"
prices <- data.frame(x = strsplit(prices, "\n", "", fixed = TRUE)[[1]])
prices <- prices %>% separate(x, " ", into = c("Date", "Prices"), extra = "merge")
prices <- prices[-1,]
prices
# Date Prices
# 2 Today 1,239 €
# 3 Yesterday 1,2 €
# 4 17/04/2018 1,2 €
# 5 14/04/2018 1,2 €
# 6 13/04/2018 1,2 €
# 7 12/04/2018 1,2 €
# 8 11/04/2018 1,2 €
# 9 09/04/2018 1,2 €
# 10 08/04/2018 1,2 €
# 11 07/04/2018 1,2 €