由 \n 分隔的字符串到数据框

Question

我有以下字符串：

  "Title\nToday 1,239 €\nYesterday 1,2 €\n17/04/2018 1,2 €\n14/04/2018 1,2 €\n13/04/2018 1,2 €\n12/04/2018 1,2 €\n11/04/2018 1,2 €\n09/04/2018 1,2 €\n08/04/2018 1,2 €\n07/04/2018 1,2 €"

但我不知道是否可以从中获取数据框。我想得到一个包含两列（日期和价格）的数据框，我的字符串如下（不需要 Title 名称）：

Date       Price
Today      1,239 €
Yesteday   1,2 €
17/04/2018 1,2 €
14/04/2018 1,2 €
13/04/2018 1,2 €
12/04/2018 1,2 €
11/04/2018 1,2 €
09/04/2018 1,2 €
08/04/2018 1,2 €
07/04/2018 1,2 €

这几乎和我用 cat 函数得到的一样。但我想我可以将它转换为数据框。有什么想法吗？

Answer 1

我建议做这样的事情来将字符串 s 转换为 data.frame。这个想法是将日期、值和单位分开，以便更轻松地处理数据，因为您将单位和数字条目分开。

df <- do.call(rbind.data.frame, strsplit(
    unlist(strsplit(sub("Title\n", "", s), "\n")),
    " "))
colnames(df) <- c("Date", "Value", "Unit");
df$Value <- as.numeric(as.character(sub(",", ".", df$Value)));
#         Date Value Unit
#1       Today 1.239    €
#2   Yesterday 1.200    €
#3  17/04/2018 1.200    €
#4  14/04/2018 1.200    €
#5  13/04/2018 1.200    €
#6  12/04/2018 1.200    €
#7  11/04/2018 1.200    €
#8  09/04/2018 1.200    €
#9  08/04/2018 1.200    €
#10 07/04/2018 1.200    €

解释：我们首先在 "\n" 上拆分 s，然后在空白处拆分出 Date、Value 和 Unit。由于您的值包含逗号小数点分隔符“,”，因此我们将“,”替换为“.”并转换为 numeric.

你可以避免sub("Title\n", "", s)（感谢@PoGibas），让它稍微紧凑一些，方法是：

df <- do.call(rbind.data.frame, strsplit(unlist(strsplit(s, "\n"))[-1], " "))
colnames(df) <- c("Date", "Value", "Unit");
df$Value <- as.numeric(as.character(sub(",", ".", df$Value)));

输出同上

示例数据

s <-   "Title\nToday 1,239 €\nYesterday 1,2 €\n17/04/2018 1,2 €\n14/04/2018 1,2 €\n13/04/2018 1,2 €\n12/04/2018 1,2 €\n11/04/2018 1,2 €\n09/04/2018 1,2 €\n08/04/2018 1,2 €\n07/04/2018 1,2 €"

Answer 2

我已经实施了几次 strsplit，然后我构建了一个 matrix，它被转换成一个数据框（通过获取矩阵):

# Making a short object containing your string
x <- "Title\nToday 1,239 €\nYesterday 1,2 €\n17/04/2018 1,2 €\n14/04/2018 1,2 €\n13/04/2018 1,2 €\n12/04/2018 1,2 €\n11/04/2018 1,2 €\n09/04/2018 1,2 €\n08/04/2018 1,2 €\n07/04/2018 1,2 €"

# Two string splits (first splitting by "\n" and then by " "), and discarding the "title" (by taking [[1]][2:11])
x <- unlist(strsplit(strsplit(x, split = "\n")[[1]][2:11], split = " "))

# Putting it in a data frame (dropping the € symbol)
df1 <- data.frame(matrix(x, ncol = 3, byrow = T)[,1:2])

结果：

> df1
           X1    X2
1       Today 1,239
2   Yesterday   1,2
3  17/04/2018   1,2
4  14/04/2018   1,2
5  13/04/2018   1,2
6  12/04/2018   1,2
7  11/04/2018   1,2
8  09/04/2018   1,2
9  08/04/2018   1,2
10 07/04/2018   1,2

我还要添加到子“，”到“。”值为 numeric

x <- unlist(strsplit(strsplit(x, split = "\n")[[1]][2:11], split = " "))
x <- gsub(",", ".", x)
df1 <- data.frame(matrix(x, ncol = 3, byrow = T)[,1:2])
df1[,2] <- as.numeric(levels(df1[,2]))[df1[,2]]

Answer 3

这里是 read.table 的解决方案：

> read.table(text=str, sep=' ', skip=1, col.names=c('Date', 'Price', 'Currency'))
         Date Price Currency
1       Today 1,239        €
2   Yesterday   1,2        €
3  17/04/2018   1,2        €
4  14/04/2018   1,2        €
5  13/04/2018   1,2        €
6  12/04/2018   1,2        €
7  11/04/2018   1,2        €
8  09/04/2018   1,2        €
9  08/04/2018   1,2        €
10 07/04/2018   1,2        €

其中 str 是您的数据。请注意参数 skip 正在删除 'Title'.

Answer 4

这是 strsplit 和 dplyr::separate 的解决方案。

prices <- "Title\nToday 1,239 €\nYesterday 1,2 €\n17/04/2018 1,2 €\n14/04/2018 1,2 €\n13/04/2018 1,2 €\n12/04/2018 1,2 €\n11/04/2018 1,2 €\n09/04/2018 1,2 €\n08/04/2018 1,2 €\n07/04/2018 1,2 €"

prices <- data.frame(x = strsplit(prices, "\n", "", fixed = TRUE)[[1]])
prices <- prices %>% separate(x, " ", into = c("Date", "Prices"), extra = "merge") 
prices <- prices[-1,]
prices
#          Date  Prices
# 2       Today 1,239 €
# 3   Yesterday   1,2 €
# 4  17/04/2018   1,2 €
# 5  14/04/2018   1,2 €
# 6  13/04/2018   1,2 €
# 7  12/04/2018   1,2 €
# 8  11/04/2018   1,2 €
# 9  09/04/2018   1,2 €
# 10 08/04/2018   1,2 €
# 11 07/04/2018   1,2 €

由 \n 分隔的字符串到数据框

String separated by \n to dataframe

string

r

cat

dataframe

示例数据