由 \n 分隔的字符串到数据框

String separated by \n to dataframe

我有以下字符串:

  "Title\nToday 1,239 €\nYesterday 1,2 €\n17/04/2018 1,2 €\n14/04/2018 1,2 €\n13/04/2018 1,2 €\n12/04/2018 1,2 €\n11/04/2018 1,2 €\n09/04/2018 1,2 €\n08/04/2018 1,2 €\n07/04/2018 1,2 €"

但我不知道是否可以从中获取数据框。我想得到一个包含两列(日期和价格)的数据框,我的字符串如下(不需要 Title 名称):

Date       Price
Today      1,239 €
Yesteday   1,2 €
17/04/2018 1,2 €
14/04/2018 1,2 €
13/04/2018 1,2 €
12/04/2018 1,2 €
11/04/2018 1,2 €
09/04/2018 1,2 €
08/04/2018 1,2 €
07/04/2018 1,2 €

这几乎和我用 cat 函数得到的一样。但我想我可以将它转换为数据框。 有什么想法吗?

我建议做这样的事情来将字符串 s 转换为 data.frame。这个想法是将日期、值和单位分开,以便更轻松地处理数据,因为您将单位和数字条目分开。

df <- do.call(rbind.data.frame, strsplit(
    unlist(strsplit(sub("Title\n", "", s), "\n")),
    " "))
colnames(df) <- c("Date", "Value", "Unit");
df$Value <- as.numeric(as.character(sub(",", ".", df$Value)));
#         Date Value Unit
#1       Today 1.239    €
#2   Yesterday 1.200    €
#3  17/04/2018 1.200    €
#4  14/04/2018 1.200    €
#5  13/04/2018 1.200    €
#6  12/04/2018 1.200    €
#7  11/04/2018 1.200    €
#8  09/04/2018 1.200    €
#9  08/04/2018 1.200    €
#10 07/04/2018 1.200    €

解释:我们首先在 "\n" 上拆分 s,然后在空白处拆分出 DateValueUnit。由于您的值包含逗号小数点分隔符“,”,因此我们将“,”替换为“.”并转换为 numeric.


你可以避免sub("Title\n", "", s)(感谢@PoGibas),让它稍微紧凑一些,方法是:

df <- do.call(rbind.data.frame, strsplit(unlist(strsplit(s, "\n"))[-1], " "))
colnames(df) <- c("Date", "Value", "Unit");
df$Value <- as.numeric(as.character(sub(",", ".", df$Value)));

输出同上


示例数据

s <-   "Title\nToday 1,239 €\nYesterday 1,2 €\n17/04/2018 1,2 €\n14/04/2018 1,2 €\n13/04/2018 1,2 €\n12/04/2018 1,2 €\n11/04/2018 1,2 €\n09/04/2018 1,2 €\n08/04/2018 1,2 €\n07/04/2018 1,2 €"

我已经实施了几次 strsplit,然后我构建了一个 matrix,它被转换成一个数据框(通过获取矩阵):

# Making a short object containing your string
x <- "Title\nToday 1,239 €\nYesterday 1,2 €\n17/04/2018 1,2 €\n14/04/2018 1,2 €\n13/04/2018 1,2 €\n12/04/2018 1,2 €\n11/04/2018 1,2 €\n09/04/2018 1,2 €\n08/04/2018 1,2 €\n07/04/2018 1,2 €"

# Two string splits (first splitting by "\n" and then by " "), and discarding the "title" (by taking [[1]][2:11])
x <- unlist(strsplit(strsplit(x, split = "\n")[[1]][2:11], split = " "))

# Putting it in a data frame (dropping the € symbol)
df1 <- data.frame(matrix(x, ncol = 3, byrow = T)[,1:2])

结果:

> df1
           X1    X2
1       Today 1,239
2   Yesterday   1,2
3  17/04/2018   1,2
4  14/04/2018   1,2
5  13/04/2018   1,2
6  12/04/2018   1,2
7  11/04/2018   1,2
8  09/04/2018   1,2
9  08/04/2018   1,2
10 07/04/2018   1,2

我还要添加到子“,”到“。”值为 numeric

x <- unlist(strsplit(strsplit(x, split = "\n")[[1]][2:11], split = " "))
x <- gsub(",", ".", x)
df1 <- data.frame(matrix(x, ncol = 3, byrow = T)[,1:2])
df1[,2] <- as.numeric(levels(df1[,2]))[df1[,2]]

这里是 read.table 的解决方案:

> read.table(text=str, sep=' ', skip=1, col.names=c('Date', 'Price', 'Currency'))
         Date Price Currency
1       Today 1,239        €
2   Yesterday   1,2        €
3  17/04/2018   1,2        €
4  14/04/2018   1,2        €
5  13/04/2018   1,2        €
6  12/04/2018   1,2        €
7  11/04/2018   1,2        €
8  09/04/2018   1,2        €
9  08/04/2018   1,2        €
10 07/04/2018   1,2        €

其中 str 是您的数据。请注意参数 skip 正在删除 'Title'.

这是 strsplitdplyr::separate 的解决方案。

prices <- "Title\nToday 1,239 €\nYesterday 1,2 €\n17/04/2018 1,2 €\n14/04/2018 1,2 €\n13/04/2018 1,2 €\n12/04/2018 1,2 €\n11/04/2018 1,2 €\n09/04/2018 1,2 €\n08/04/2018 1,2 €\n07/04/2018 1,2 €"

prices <- data.frame(x = strsplit(prices, "\n", "", fixed = TRUE)[[1]])
prices <- prices %>% separate(x, " ", into = c("Date", "Prices"), extra = "merge") 
prices <- prices[-1,]
prices
#          Date  Prices
# 2       Today 1,239 €
# 3   Yesterday   1,2 €
# 4  17/04/2018   1,2 €
# 5  14/04/2018   1,2 €
# 6  13/04/2018   1,2 €
# 7  12/04/2018   1,2 €
# 8  11/04/2018   1,2 €
# 9  09/04/2018   1,2 €
# 10 08/04/2018   1,2 €
# 11 07/04/2018   1,2 €