将单行数据拆分为多行
Splitting single rows data into multiple rows
我有一组数据,有 3k 位作者的共同作者。我有 Sender 和 Receiver (或 Source 和 Target) 和一个包含 期刊名称 和 出版年份 的列。如果一些作者有不止一篇共同文章,结果将在一行中以逗号分隔。我要做的是将这些行分成多行。 data.frame - my GitHub repository
例如:
HALL M,DE JONG GF, "['GRAEFE DR 2008 INTERNATIONAL MIGRATION REVIEW', 'HALL M 2010 SOCIAL SCIENCE RESEARCH']"
我需要像这样拆分最后一列:
HALL M,DE JONG GF, GRAEFE DR 2008 INTERNATIONAL MIGRATION REVIEW
HALL M,DE JONG GF, HALL M 2010 SOCIAL SCIENCE RESEARCH
我听说我需要用 R 编写一个简单的循环,但我不知道它应该是什么样子。
编辑
我的数据输出,前 20 行:
> dput(head(temp,n=20))
structure(list(Source = c("HUMPHREY CR", "HUMPHREY CR", "HUMPHREY CR",
"SELL RR", "SELL RR", "SELL RR", "GARDNER RW", "GARDNER RW",
"GARDNER RW", "GARDNER RW", "GARDNER RW", "GARDNER RW", "GARDNER RW",
"GARDNER RW", "FAWCETT JT", "FAWCETT JT", "FAWCETT JT", "FAWCETT JT",
"FAWCETT JT", "FAWCETT JT"), Target = c("SELL RR", "GILLASPY RT",
"KROUT JA", "GILLASPY RT", "KROUT JA", "DEJONG GF", "FAWCETT JT",
"ARNOLD F", "CARINO BV", "ROOT BD", "DEJONG G", "ABAD RG", "DEJONG GF",
"BOUVIER LF", "ARNOLD F", "PARK IH", "CARINO BV", "ROOT BD",
"DEJONG G", "ABAD RG"), Type = c("Undirected", "Undirected",
"Undirected", "Undirected", "Undirected", "Undirected", "Undirected",
"Undirected", "Undirected", "Undirected", "Undirected", "Undirected",
"Undirected", "Undirected", "Undirected", "Undirected", "Undirected",
"Undirected", "Undirected", "Undirected"), Id = c(2386L, 2385L,
2384L, 3635L, 3634L, 3636L, 401L, 397L, 398L, 399L, 403L, 396L,
400L, 402L, 598L, 602L, 601L, 604L, 605L, 597L), Label = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA), Weight = c(1, 1, 1, 1, 1, 1, 3, 2, 2, 1, 1, 2, 2,
1, 3, 1, 2, 1, 1, 2), ayjid = c("['HUMPHREY CR 1977 RURAL SOCIOLOGY']",
"['HUMPHREY CR 1977 RURAL SOCIOLOGY']", "['HUMPHREY CR 1977 RURAL SOCIOLOGY']",
"['HUMPHREY CR 1977 RURAL SOCIOLOGY']", "['HUMPHREY CR 1977 RURAL SOCIOLOGY']",
"['SELL RR 1978 JOURNAL OF POPULATION']", "['DEJONG GF 1983 INTERNATIONAL MIGRATION REVIEW', 'DEJONG G 1986 POPULATION AND ENVIRONMENT', 'FAWCETT JT 1994 POPULATION AND ENVIRONMENT']",
"['DEJONG GF 1983 INTERNATIONAL MIGRATION REVIEW', 'GARDNER RW 1986 POPULATION AND ENVIRONMENT']",
"['DEJONG GF 1983 INTERNATIONAL MIGRATION REVIEW', 'GARDNER RW 1986 POPULATION AND ENVIRONMENT']",
"['DEJONG G 1986 POPULATION AND ENVIRONMENT']", "['DEJONG G 1986 POPULATION AND ENVIRONMENT']",
"['DEJONG GF 1983 INTERNATIONAL MIGRATION REVIEW', 'DEJONG G 1986 POPULATION AND ENVIRONMENT']",
"['DEJONG GF 1983 INTERNATIONAL MIGRATION REVIEW', 'GARDNER RW 1986 POPULATION AND ENVIRONMENT']",
"['BOUVIER LF 1986 POPULATION BULLETIN']", "['DEJONG GF 1983 INTERNATIONAL MIGRATION REVIEW', 'ARNOLD F 1989 INTERNATIONAL MIGRATION REVIEW', 'FAWCETT JT 1987 INTERNATIONAL MIGRATION REVIEW']",
"['ARNOLD F 1989 INTERNATIONAL MIGRATION REVIEW']", "['DEJONG GF 1983 INTERNATIONAL MIGRATION REVIEW', 'ARNOLD F 1989 INTERNATIONAL MIGRATION REVIEW']",
"['DEJONG G 1986 POPULATION AND ENVIRONMENT']", "['DEJONG G 1986 POPULATION AND ENVIRONMENT']",
"['DEJONG GF 1983 INTERNATIONAL MIGRATION REVIEW', 'DEJONG G 1986 POPULATION AND ENVIRONMENT']"
)), .Names = c("Source", "Target", "Type", "Id", "Label", "Weight",
"ayjid"), row.names = c(NA, 20L), class = "data.frame")
尝试以下:
s <- strsplit(gsub("\[|\]|\'","",df$ayjid),", ",fixed = TRUE)
res <- data.frame(Id = rep(df$Id, lengths(s)), result = unlist(s))
merge(df,res)
这对于我的 "splitstackshape" 包中的 cSplit
非常简单:
library(splitstackshape)
cSplit(as.data.table(temp)[, ayjid := gsub("[][]", "", ayjid)],
"ayjid", ",", "long")
# Source Target Type Id Label Weight ayjid
# 1: HUMPHREY CR SELL RR Undirected 2386 NA 1 'HUMPHREY CR 1977 RURAL SOCIOLOGY'
# 2: HUMPHREY CR GILLASPY RT Undirected 2385 NA 1 'HUMPHREY CR 1977 RURAL SOCIOLOGY'
# 3: HUMPHREY CR KROUT JA Undirected 2384 NA 1 'HUMPHREY CR 1977 RURAL SOCIOLOGY'
# 4: SELL RR GILLASPY RT Undirected 3635 NA 1 'HUMPHREY CR 1977 RURAL SOCIOLOGY'
# 5: SELL RR KROUT JA Undirected 3634 NA 1 'HUMPHREY CR 1977 RURAL SOCIOLOGY'
# 6: SELL RR DEJONG GF Undirected 3636 NA 1 'SELL RR 1978 JOURNAL OF POPULATION'
# 7: GARDNER RW FAWCETT JT Undirected 401 NA 3 'DEJONG GF 1983 INTERNATIONAL MIGRATION REVIEW'
# 8: GARDNER RW FAWCETT JT Undirected 401 NA 3 'DEJONG G 1986 POPULATION AND ENVIRONMENT'
# 9: GARDNER RW FAWCETT JT Undirected 401 NA 3 'FAWCETT JT 1994 POPULATION AND ENVIRONMENT'
# 10: GARDNER RW ARNOLD F Undirected 397 NA 2 'DEJONG GF 1983 INTERNATIONAL MIGRATION REVIEW'
# 11: GARDNER RW ARNOLD F Undirected 397 NA 2 'GARDNER RW 1986 POPULATION AND ENVIRONMENT'
# 12: GARDNER RW CARINO BV Undirected 398 NA 2 'DEJONG GF 1983 INTERNATIONAL MIGRATION REVIEW'
# 13: GARDNER RW CARINO BV Undirected 398 NA 2 'GARDNER RW 1986 POPULATION AND ENVIRONMENT'
# 14: GARDNER RW ROOT BD Undirected 399 NA 1 'DEJONG G 1986 POPULATION AND ENVIRONMENT'
# 15: GARDNER RW DEJONG G Undirected 403 NA 1 'DEJONG G 1986 POPULATION AND ENVIRONMENT'
# 16: GARDNER RW ABAD RG Undirected 396 NA 2 'DEJONG GF 1983 INTERNATIONAL MIGRATION REVIEW'
# 17: GARDNER RW ABAD RG Undirected 396 NA 2 'DEJONG G 1986 POPULATION AND ENVIRONMENT'
# 18: GARDNER RW DEJONG GF Undirected 400 NA 2 'DEJONG GF 1983 INTERNATIONAL MIGRATION REVIEW'
# 19: GARDNER RW DEJONG GF Undirected 400 NA 2 'GARDNER RW 1986 POPULATION AND ENVIRONMENT'
# 20: GARDNER RW BOUVIER LF Undirected 402 NA 1 'BOUVIER LF 1986 POPULATION BULLETIN'
# 21: FAWCETT JT ARNOLD F Undirected 598 NA 3 'DEJONG GF 1983 INTERNATIONAL MIGRATION REVIEW'
# 22: FAWCETT JT ARNOLD F Undirected 598 NA 3 'ARNOLD F 1989 INTERNATIONAL MIGRATION REVIEW'
# 23: FAWCETT JT ARNOLD F Undirected 598 NA 3 'FAWCETT JT 1987 INTERNATIONAL MIGRATION REVIEW'
# 24: FAWCETT JT PARK IH Undirected 602 NA 1 'ARNOLD F 1989 INTERNATIONAL MIGRATION REVIEW'
# 25: FAWCETT JT CARINO BV Undirected 601 NA 2 'DEJONG GF 1983 INTERNATIONAL MIGRATION REVIEW'
# 26: FAWCETT JT CARINO BV Undirected 601 NA 2 'ARNOLD F 1989 INTERNATIONAL MIGRATION REVIEW'
# 27: FAWCETT JT ROOT BD Undirected 604 NA 1 'DEJONG G 1986 POPULATION AND ENVIRONMENT'
# 28: FAWCETT JT DEJONG G Undirected 605 NA 1 'DEJONG G 1986 POPULATION AND ENVIRONMENT'
# 29: FAWCETT JT ABAD RG Undirected 597 NA 2 'DEJONG GF 1983 INTERNATIONAL MIGRATION REVIEW'
# 30: FAWCETT JT ABAD RG Undirected 597 NA 2 'DEJONG G 1986 POPULATION AND ENVIRONMENT'
# Source Target Type Id Label Weight ayjid
如果您还想去掉结果列中的引号,可以使用 ayjid := gsub("[][']", "", ayjid)]
。
我有一组数据,有 3k 位作者的共同作者。我有 Sender 和 Receiver (或 Source 和 Target) 和一个包含 期刊名称 和 出版年份 的列。如果一些作者有不止一篇共同文章,结果将在一行中以逗号分隔。我要做的是将这些行分成多行。 data.frame - my GitHub repository
例如:
HALL M,DE JONG GF, "['GRAEFE DR 2008 INTERNATIONAL MIGRATION REVIEW', 'HALL M 2010 SOCIAL SCIENCE RESEARCH']"
我需要像这样拆分最后一列:
HALL M,DE JONG GF, GRAEFE DR 2008 INTERNATIONAL MIGRATION REVIEW
HALL M,DE JONG GF, HALL M 2010 SOCIAL SCIENCE RESEARCH
我听说我需要用 R 编写一个简单的循环,但我不知道它应该是什么样子。
编辑 我的数据输出,前 20 行:
> dput(head(temp,n=20))
structure(list(Source = c("HUMPHREY CR", "HUMPHREY CR", "HUMPHREY CR",
"SELL RR", "SELL RR", "SELL RR", "GARDNER RW", "GARDNER RW",
"GARDNER RW", "GARDNER RW", "GARDNER RW", "GARDNER RW", "GARDNER RW",
"GARDNER RW", "FAWCETT JT", "FAWCETT JT", "FAWCETT JT", "FAWCETT JT",
"FAWCETT JT", "FAWCETT JT"), Target = c("SELL RR", "GILLASPY RT",
"KROUT JA", "GILLASPY RT", "KROUT JA", "DEJONG GF", "FAWCETT JT",
"ARNOLD F", "CARINO BV", "ROOT BD", "DEJONG G", "ABAD RG", "DEJONG GF",
"BOUVIER LF", "ARNOLD F", "PARK IH", "CARINO BV", "ROOT BD",
"DEJONG G", "ABAD RG"), Type = c("Undirected", "Undirected",
"Undirected", "Undirected", "Undirected", "Undirected", "Undirected",
"Undirected", "Undirected", "Undirected", "Undirected", "Undirected",
"Undirected", "Undirected", "Undirected", "Undirected", "Undirected",
"Undirected", "Undirected", "Undirected"), Id = c(2386L, 2385L,
2384L, 3635L, 3634L, 3636L, 401L, 397L, 398L, 399L, 403L, 396L,
400L, 402L, 598L, 602L, 601L, 604L, 605L, 597L), Label = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA), Weight = c(1, 1, 1, 1, 1, 1, 3, 2, 2, 1, 1, 2, 2,
1, 3, 1, 2, 1, 1, 2), ayjid = c("['HUMPHREY CR 1977 RURAL SOCIOLOGY']",
"['HUMPHREY CR 1977 RURAL SOCIOLOGY']", "['HUMPHREY CR 1977 RURAL SOCIOLOGY']",
"['HUMPHREY CR 1977 RURAL SOCIOLOGY']", "['HUMPHREY CR 1977 RURAL SOCIOLOGY']",
"['SELL RR 1978 JOURNAL OF POPULATION']", "['DEJONG GF 1983 INTERNATIONAL MIGRATION REVIEW', 'DEJONG G 1986 POPULATION AND ENVIRONMENT', 'FAWCETT JT 1994 POPULATION AND ENVIRONMENT']",
"['DEJONG GF 1983 INTERNATIONAL MIGRATION REVIEW', 'GARDNER RW 1986 POPULATION AND ENVIRONMENT']",
"['DEJONG GF 1983 INTERNATIONAL MIGRATION REVIEW', 'GARDNER RW 1986 POPULATION AND ENVIRONMENT']",
"['DEJONG G 1986 POPULATION AND ENVIRONMENT']", "['DEJONG G 1986 POPULATION AND ENVIRONMENT']",
"['DEJONG GF 1983 INTERNATIONAL MIGRATION REVIEW', 'DEJONG G 1986 POPULATION AND ENVIRONMENT']",
"['DEJONG GF 1983 INTERNATIONAL MIGRATION REVIEW', 'GARDNER RW 1986 POPULATION AND ENVIRONMENT']",
"['BOUVIER LF 1986 POPULATION BULLETIN']", "['DEJONG GF 1983 INTERNATIONAL MIGRATION REVIEW', 'ARNOLD F 1989 INTERNATIONAL MIGRATION REVIEW', 'FAWCETT JT 1987 INTERNATIONAL MIGRATION REVIEW']",
"['ARNOLD F 1989 INTERNATIONAL MIGRATION REVIEW']", "['DEJONG GF 1983 INTERNATIONAL MIGRATION REVIEW', 'ARNOLD F 1989 INTERNATIONAL MIGRATION REVIEW']",
"['DEJONG G 1986 POPULATION AND ENVIRONMENT']", "['DEJONG G 1986 POPULATION AND ENVIRONMENT']",
"['DEJONG GF 1983 INTERNATIONAL MIGRATION REVIEW', 'DEJONG G 1986 POPULATION AND ENVIRONMENT']"
)), .Names = c("Source", "Target", "Type", "Id", "Label", "Weight",
"ayjid"), row.names = c(NA, 20L), class = "data.frame")
尝试以下:
s <- strsplit(gsub("\[|\]|\'","",df$ayjid),", ",fixed = TRUE)
res <- data.frame(Id = rep(df$Id, lengths(s)), result = unlist(s))
merge(df,res)
这对于我的 "splitstackshape" 包中的 cSplit
非常简单:
library(splitstackshape)
cSplit(as.data.table(temp)[, ayjid := gsub("[][]", "", ayjid)],
"ayjid", ",", "long")
# Source Target Type Id Label Weight ayjid
# 1: HUMPHREY CR SELL RR Undirected 2386 NA 1 'HUMPHREY CR 1977 RURAL SOCIOLOGY'
# 2: HUMPHREY CR GILLASPY RT Undirected 2385 NA 1 'HUMPHREY CR 1977 RURAL SOCIOLOGY'
# 3: HUMPHREY CR KROUT JA Undirected 2384 NA 1 'HUMPHREY CR 1977 RURAL SOCIOLOGY'
# 4: SELL RR GILLASPY RT Undirected 3635 NA 1 'HUMPHREY CR 1977 RURAL SOCIOLOGY'
# 5: SELL RR KROUT JA Undirected 3634 NA 1 'HUMPHREY CR 1977 RURAL SOCIOLOGY'
# 6: SELL RR DEJONG GF Undirected 3636 NA 1 'SELL RR 1978 JOURNAL OF POPULATION'
# 7: GARDNER RW FAWCETT JT Undirected 401 NA 3 'DEJONG GF 1983 INTERNATIONAL MIGRATION REVIEW'
# 8: GARDNER RW FAWCETT JT Undirected 401 NA 3 'DEJONG G 1986 POPULATION AND ENVIRONMENT'
# 9: GARDNER RW FAWCETT JT Undirected 401 NA 3 'FAWCETT JT 1994 POPULATION AND ENVIRONMENT'
# 10: GARDNER RW ARNOLD F Undirected 397 NA 2 'DEJONG GF 1983 INTERNATIONAL MIGRATION REVIEW'
# 11: GARDNER RW ARNOLD F Undirected 397 NA 2 'GARDNER RW 1986 POPULATION AND ENVIRONMENT'
# 12: GARDNER RW CARINO BV Undirected 398 NA 2 'DEJONG GF 1983 INTERNATIONAL MIGRATION REVIEW'
# 13: GARDNER RW CARINO BV Undirected 398 NA 2 'GARDNER RW 1986 POPULATION AND ENVIRONMENT'
# 14: GARDNER RW ROOT BD Undirected 399 NA 1 'DEJONG G 1986 POPULATION AND ENVIRONMENT'
# 15: GARDNER RW DEJONG G Undirected 403 NA 1 'DEJONG G 1986 POPULATION AND ENVIRONMENT'
# 16: GARDNER RW ABAD RG Undirected 396 NA 2 'DEJONG GF 1983 INTERNATIONAL MIGRATION REVIEW'
# 17: GARDNER RW ABAD RG Undirected 396 NA 2 'DEJONG G 1986 POPULATION AND ENVIRONMENT'
# 18: GARDNER RW DEJONG GF Undirected 400 NA 2 'DEJONG GF 1983 INTERNATIONAL MIGRATION REVIEW'
# 19: GARDNER RW DEJONG GF Undirected 400 NA 2 'GARDNER RW 1986 POPULATION AND ENVIRONMENT'
# 20: GARDNER RW BOUVIER LF Undirected 402 NA 1 'BOUVIER LF 1986 POPULATION BULLETIN'
# 21: FAWCETT JT ARNOLD F Undirected 598 NA 3 'DEJONG GF 1983 INTERNATIONAL MIGRATION REVIEW'
# 22: FAWCETT JT ARNOLD F Undirected 598 NA 3 'ARNOLD F 1989 INTERNATIONAL MIGRATION REVIEW'
# 23: FAWCETT JT ARNOLD F Undirected 598 NA 3 'FAWCETT JT 1987 INTERNATIONAL MIGRATION REVIEW'
# 24: FAWCETT JT PARK IH Undirected 602 NA 1 'ARNOLD F 1989 INTERNATIONAL MIGRATION REVIEW'
# 25: FAWCETT JT CARINO BV Undirected 601 NA 2 'DEJONG GF 1983 INTERNATIONAL MIGRATION REVIEW'
# 26: FAWCETT JT CARINO BV Undirected 601 NA 2 'ARNOLD F 1989 INTERNATIONAL MIGRATION REVIEW'
# 27: FAWCETT JT ROOT BD Undirected 604 NA 1 'DEJONG G 1986 POPULATION AND ENVIRONMENT'
# 28: FAWCETT JT DEJONG G Undirected 605 NA 1 'DEJONG G 1986 POPULATION AND ENVIRONMENT'
# 29: FAWCETT JT ABAD RG Undirected 597 NA 2 'DEJONG GF 1983 INTERNATIONAL MIGRATION REVIEW'
# 30: FAWCETT JT ABAD RG Undirected 597 NA 2 'DEJONG G 1986 POPULATION AND ENVIRONMENT'
# Source Target Type Id Label Weight ayjid
如果您还想去掉结果列中的引号,可以使用 ayjid := gsub("[][']", "", ayjid)]
。