仅分离变量名后转置
Transpose after separating only the variable name
我是 R 的新手,但我沉迷于精通!我正在做一个工作项目,我完全被难住了!非常感谢任何帮助!
我需要转换这个数据框...
Brand UK__Sales__YA UK__Sales__MAT CN__Sales__YA CN__Sales__MAT
1 Snickers 100 110 90 95
2 Twix 50 60 30 35
3 Skittles 75 80 105 130
...到这个
Brand Country Year Sales
1 Snickers UK YA 100
2 Snickers UK MAT 110
3 Snickers CN YA 90
4 Snickers CN MAT 95
5 Twix UK YA 50
6 Twix UK MAT 60
7 Twix CN YA 30
8 Twix CN MAT 35
9 Skittles UK YA 75
10 Skittles UK MAT 80
11 Skittles CN YA 105
12 Skittles CN MAT 130
如您所知,我需要拆分销售变量的第一部分和最后一部分,并将它们创建为单独的数据堆栈。我的数据集中还有其他国家/地区和其他指标,但我认为如果您能帮我解决这个问题,那么我就可以完成它。谢谢!! :-)
查看 tidyr
package -- in fact, all of the packages in the tidyverse
对此类数据处理工作有帮助:
library(tidyr)
library(dplyr)
df %>%
gather(key, Sales, -Brand) %>%
separate(key, c("Country", "delete", "Year"), sep = "__") %>%
select(-delete) %>%
arrange(Brand)
# Brand Country Year Sales
# 1 Skittles UK YA 75
# 2 Skittles UK MAT 80
# 3 Skittles CN YA 105
# 4 Skittles CN MAT 130
# 5 Snickers UK YA 100
# 6 Snickers UK MAT 110
# 7 Snickers CN YA 90
# 8 Snickers CN MAT 95
# 9 Twix UK YA 50
# 10 Twix UK MAT 60
# 11 Twix CN YA 30
# 12 Twix CN MAT 35
要了解发生了什么,运行 每个管道 %>%
单独声明:(例如,查看 df %>% gather(key, Sales, -Brand)
之后的输出以了解其作用)。接下来 运行 通过 separate
管道进行转换。
这是 tidyverse
的一个选项。我们将 gather
转化为 'long' 格式然后 extract
将 'Var' 列转化为 'Country' 和 'Year'
library(tidyr)
library(dplyr)
gather(df1, Var, Sales, -Brand) %>%
extract(Var, into = c("Country", "Year"), "(\w+)__\w+__(\w+)")
# Brand Country Year Sales
#1 Snickers UK YA 100
#2 Twix UK YA 50
#3 Skittles UK YA 75
#4 Snickers UK MAT 110
#5 Twix UK MAT 60
#6 Skittles UK MAT 80
#7 Snickers CN YA 90
#8 Twix CN YA 30
#9 Skittles CN YA 105
#10 Snickers CN MAT 95
#11 Twix CN MAT 35
#12 Skittles CN MAT 130
与 data.table
对应的选项是
library(data.table)
melt(setDT(df1), id.var = "Brand", value.names = "Sales")[,
c("Country", "Year") := tstrsplit(variable, "__")[-2]][, variable := NULL][]
1) dplyr/tidyr 使用最后注释中可重复显示的数据,将数据框从宽到长的形式收集起来,然后分离出新专栏。使用 Value 列作为其中的值,将新的 Variable 列散布到 Price 和 Sales 中,然后进行排序。如果顺序无关紧要,最后一行代码可以省略。
library(dplyr)
library(tidyr)
DF %>%
gather(new, Value, -Brand) %>%
separate(new, c("Country", "Variable", "Year"), sep = "__") %>%
spread(Variable, Value) %>%
arrange(Brand, desc(Country), desc(Year))
给予:
Brand Country Year Sales
1 Skittles UK YA 75
2 Skittles UK MAT 80
3 Skittles CN YA 105
4 Skittles CN MAT 130
5 Snickers UK YA 100
6 Snickers UK MAT 110
7 Snickers CN YA 90
8 Snickers CN MAT 95
9 Twix UK YA 50
10 Twix UK MAT 60
11 Twix CN YA 30
12 Twix CN MAT 35
请注意,以上内容也适用 DF2
也在下面的注释中定义。
1a) 这个稍微短一点的替代方案也可以,但只适用于 DF
,不适用于 DF2
。同样,如果顺序无关紧要,可以省略 arrange
行。
DF %>%
gather(new, Sales, -Brand) %>%
separate(new, c("Country", "Year"), sep = "__Sales__") %>%
arrange(Brand, desc(Country), desc(Year))
2) 此替代方案不涉及使用 reshape
将宽格式重塑为长格式的包。如果行名和顺序无关紧要,则可以省略从 rownames(long) <- NULL
语句开始的所有内容。此代码也适用于 DF2
.
varying <- split(names(DF)[-1], sub(".*__(.*)__.*", "\1", names(DF)[-1]))
long <- reshape(DF, dir = "long", idvar = "Brand", varying = varying,
v.names = names(varying))
out <- transform(long, Country = sub("__.*", "", time), Year = sub(".*__", "", time),
time = NULL)
rownames(out) <- NULL
o <- with(out, order(Brand, -xtfrm(Country), -xtfrm(Year)))
out <- out[o, ]
out
给予:
Brand Sales Country Year
3 Skittles 75 UK YA
6 Skittles 80 UK MAT
9 Skittles 105 CN YA
12 Skittles 130 CN MAT
1 Snickers 100 UK YA
4 Snickers 110 UK MAT
7 Snickers 90 CN YA
10 Snickers 95 CN MAT
2 Twix 50 UK YA
5 Twix 60 UK MAT
8 Twix 30 CN YA
11 Twix 35 CN MAT
备注
Lines <- " Brand UK__Sales__YA UK__Sales__MAT CN__Sales__YA CN__Sales__MAT
1 Snickers 100 110 90 95
2 Twix 50 60 30 35
3 Skittles 75 80 105 130"
DF <- read.table(text = Lines)
# same as DF but with additional columns for Price
DF2 <- cbind(DF, setNames(10 * DF[2:5], sub("Sales", "Price", names(DF)[2:5])))
这是一个使用包 reshape2
的解决方案。
new <- reshape2::melt(data, id.vars = "Brand")
new$Country <- sub("(^[^_]*)_.*$", "\1", new$variable)
new$Year <- sub("^.*_([[:alpha:]]*$)", "\1", new$variable)
new <- new[, c(1, 4, 5, 3)]
names(new)[4] <- "Sales"
head(new)
# Brand Country Year Sales
#1 Snickers UK YA 100
#2 Twix UK YA 50
#3 Skittles UK YA 75
#4 Snickers UK MAT 110
#5 Twix UK MAT 60
#6 Skittles UK MAT 80
数据
data <-
structure(list(Brand = c("Snickers", "Twix", "Skittles"), UK__Sales__YA = c(100L,
50L, 75L), UK__Sales__MAT = c(110L, 60L, 80L), CN__Sales__YA = c(90L,
30L, 105L), CN__Sales__MAT = c(95L, 35L, 130L)), .Names = c("Brand",
"UK__Sales__YA", "UK__Sales__MAT", "CN__Sales__YA", "CN__Sales__MAT"
), class = "data.frame", row.names = c("1", "2", "3"))
我是 R 的新手,但我沉迷于精通!我正在做一个工作项目,我完全被难住了!非常感谢任何帮助!
我需要转换这个数据框...
Brand UK__Sales__YA UK__Sales__MAT CN__Sales__YA CN__Sales__MAT
1 Snickers 100 110 90 95
2 Twix 50 60 30 35
3 Skittles 75 80 105 130
...到这个
Brand Country Year Sales
1 Snickers UK YA 100
2 Snickers UK MAT 110
3 Snickers CN YA 90
4 Snickers CN MAT 95
5 Twix UK YA 50
6 Twix UK MAT 60
7 Twix CN YA 30
8 Twix CN MAT 35
9 Skittles UK YA 75
10 Skittles UK MAT 80
11 Skittles CN YA 105
12 Skittles CN MAT 130
如您所知,我需要拆分销售变量的第一部分和最后一部分,并将它们创建为单独的数据堆栈。我的数据集中还有其他国家/地区和其他指标,但我认为如果您能帮我解决这个问题,那么我就可以完成它。谢谢!! :-)
查看 tidyr
package -- in fact, all of the packages in the tidyverse
对此类数据处理工作有帮助:
library(tidyr)
library(dplyr)
df %>%
gather(key, Sales, -Brand) %>%
separate(key, c("Country", "delete", "Year"), sep = "__") %>%
select(-delete) %>%
arrange(Brand)
# Brand Country Year Sales
# 1 Skittles UK YA 75
# 2 Skittles UK MAT 80
# 3 Skittles CN YA 105
# 4 Skittles CN MAT 130
# 5 Snickers UK YA 100
# 6 Snickers UK MAT 110
# 7 Snickers CN YA 90
# 8 Snickers CN MAT 95
# 9 Twix UK YA 50
# 10 Twix UK MAT 60
# 11 Twix CN YA 30
# 12 Twix CN MAT 35
要了解发生了什么,运行 每个管道 %>%
单独声明:(例如,查看 df %>% gather(key, Sales, -Brand)
之后的输出以了解其作用)。接下来 运行 通过 separate
管道进行转换。
这是 tidyverse
的一个选项。我们将 gather
转化为 'long' 格式然后 extract
将 'Var' 列转化为 'Country' 和 'Year'
library(tidyr)
library(dplyr)
gather(df1, Var, Sales, -Brand) %>%
extract(Var, into = c("Country", "Year"), "(\w+)__\w+__(\w+)")
# Brand Country Year Sales
#1 Snickers UK YA 100
#2 Twix UK YA 50
#3 Skittles UK YA 75
#4 Snickers UK MAT 110
#5 Twix UK MAT 60
#6 Skittles UK MAT 80
#7 Snickers CN YA 90
#8 Twix CN YA 30
#9 Skittles CN YA 105
#10 Snickers CN MAT 95
#11 Twix CN MAT 35
#12 Skittles CN MAT 130
与 data.table
对应的选项是
library(data.table)
melt(setDT(df1), id.var = "Brand", value.names = "Sales")[,
c("Country", "Year") := tstrsplit(variable, "__")[-2]][, variable := NULL][]
1) dplyr/tidyr 使用最后注释中可重复显示的数据,将数据框从宽到长的形式收集起来,然后分离出新专栏。使用 Value 列作为其中的值,将新的 Variable 列散布到 Price 和 Sales 中,然后进行排序。如果顺序无关紧要,最后一行代码可以省略。
library(dplyr)
library(tidyr)
DF %>%
gather(new, Value, -Brand) %>%
separate(new, c("Country", "Variable", "Year"), sep = "__") %>%
spread(Variable, Value) %>%
arrange(Brand, desc(Country), desc(Year))
给予:
Brand Country Year Sales
1 Skittles UK YA 75
2 Skittles UK MAT 80
3 Skittles CN YA 105
4 Skittles CN MAT 130
5 Snickers UK YA 100
6 Snickers UK MAT 110
7 Snickers CN YA 90
8 Snickers CN MAT 95
9 Twix UK YA 50
10 Twix UK MAT 60
11 Twix CN YA 30
12 Twix CN MAT 35
请注意,以上内容也适用 DF2
也在下面的注释中定义。
1a) 这个稍微短一点的替代方案也可以,但只适用于 DF
,不适用于 DF2
。同样,如果顺序无关紧要,可以省略 arrange
行。
DF %>%
gather(new, Sales, -Brand) %>%
separate(new, c("Country", "Year"), sep = "__Sales__") %>%
arrange(Brand, desc(Country), desc(Year))
2) 此替代方案不涉及使用 reshape
将宽格式重塑为长格式的包。如果行名和顺序无关紧要,则可以省略从 rownames(long) <- NULL
语句开始的所有内容。此代码也适用于 DF2
.
varying <- split(names(DF)[-1], sub(".*__(.*)__.*", "\1", names(DF)[-1]))
long <- reshape(DF, dir = "long", idvar = "Brand", varying = varying,
v.names = names(varying))
out <- transform(long, Country = sub("__.*", "", time), Year = sub(".*__", "", time),
time = NULL)
rownames(out) <- NULL
o <- with(out, order(Brand, -xtfrm(Country), -xtfrm(Year)))
out <- out[o, ]
out
给予:
Brand Sales Country Year
3 Skittles 75 UK YA
6 Skittles 80 UK MAT
9 Skittles 105 CN YA
12 Skittles 130 CN MAT
1 Snickers 100 UK YA
4 Snickers 110 UK MAT
7 Snickers 90 CN YA
10 Snickers 95 CN MAT
2 Twix 50 UK YA
5 Twix 60 UK MAT
8 Twix 30 CN YA
11 Twix 35 CN MAT
备注
Lines <- " Brand UK__Sales__YA UK__Sales__MAT CN__Sales__YA CN__Sales__MAT
1 Snickers 100 110 90 95
2 Twix 50 60 30 35
3 Skittles 75 80 105 130"
DF <- read.table(text = Lines)
# same as DF but with additional columns for Price
DF2 <- cbind(DF, setNames(10 * DF[2:5], sub("Sales", "Price", names(DF)[2:5])))
这是一个使用包 reshape2
的解决方案。
new <- reshape2::melt(data, id.vars = "Brand")
new$Country <- sub("(^[^_]*)_.*$", "\1", new$variable)
new$Year <- sub("^.*_([[:alpha:]]*$)", "\1", new$variable)
new <- new[, c(1, 4, 5, 3)]
names(new)[4] <- "Sales"
head(new)
# Brand Country Year Sales
#1 Snickers UK YA 100
#2 Twix UK YA 50
#3 Skittles UK YA 75
#4 Snickers UK MAT 110
#5 Twix UK MAT 60
#6 Skittles UK MAT 80
数据
data <-
structure(list(Brand = c("Snickers", "Twix", "Skittles"), UK__Sales__YA = c(100L,
50L, 75L), UK__Sales__MAT = c(110L, 60L, 80L), CN__Sales__YA = c(90L,
30L, 105L), CN__Sales__MAT = c(95L, 35L, 130L)), .Names = c("Brand",
"UK__Sales__YA", "UK__Sales__MAT", "CN__Sales__YA", "CN__Sales__MAT"
), class = "data.frame", row.names = c("1", "2", "3"))