将逗号分隔的长字符串转换为 x 和 y 列
Converting long string of comma delimited integers into x and y columns
我的数据是一长串由逗号分隔的单行值,其中每隔一个值是一个 x 或 y 坐标
数据如下所示:
2622731,1387660,2621628,1444522,2619235,1681640
但我希望它看起来像这样:
2622731,1387660
2621628,1444522
2619235,1681640
如果没有像我在上面的示例中那样遍历整个文件并删除逗号并按回车键,我该如何在 R(或 Stata)中自动执行此操作?
在 R 中:
## Read in your data
## data = readLines("path/to/your_file.txt")
## Should get you something like this (using the example in your Q)
data = "2622731,1387660,2621628,1444522,2619235,1681640"
data = unlist(strsplit(data, ","))
data = matrix(as.numeric(data), ncol = 2, byrow = TRUE)
data
# [,1] [,2]
# [1,] 2622731 1387660
# [2,] 2621628 1444522
# [3,] 2619235 1681640
到那个时候,也许
data = as.data.frame(data)
names(data) = c("x", "y")
# x y
# 1 2622731 1387660
# 2 2621628 1444522
# 3 2619235 1681640
在 Stata 中,公认的 R 解决方案的类似物可能涉及 split
和 reshape long
。这是另一种方法:
* data example
clear
set obs 1
gen strL data = "2622731,1387660,2621628,1444522,2619235,1681640"
* code for data example
replace data = subinstr(data, ",", " ", .)
set obs `=wordcount(data)/2'
gen x = real(word(data[1], 2 * _n - 1))
gen y = real(word(data[1], 2 * _n))
list
+---------------------------------------------------------------------+
| data x y |
|---------------------------------------------------------------------|
1. | 2622731 1387660 2621628 1444522 2619235 1681640 2622731 1387660 |
2. | 2621628 1444522 |
3. | 2619235 1681640 |
+---------------------------------------------------------------------+
使用 scan
并使用 matrix
重塑:
s <- "2622731,1387660,2621628,1444522,2619235,1681640" # test data
matrix(scan(text = s, sep = ",", quiet = TRUE), ncol = 2, byrow = TRUE)
## [,1] [,2]
## [1,] 2622731 1387660
## [2,] 2621628 1444522
## [3,] 2619235 1681640
我的数据是一长串由逗号分隔的单行值,其中每隔一个值是一个 x 或 y 坐标
数据如下所示: 2622731,1387660,2621628,1444522,2619235,1681640
但我希望它看起来像这样:
2622731,1387660
2621628,1444522
2619235,1681640
如果没有像我在上面的示例中那样遍历整个文件并删除逗号并按回车键,我该如何在 R(或 Stata)中自动执行此操作?
在 R 中:
## Read in your data
## data = readLines("path/to/your_file.txt")
## Should get you something like this (using the example in your Q)
data = "2622731,1387660,2621628,1444522,2619235,1681640"
data = unlist(strsplit(data, ","))
data = matrix(as.numeric(data), ncol = 2, byrow = TRUE)
data
# [,1] [,2]
# [1,] 2622731 1387660
# [2,] 2621628 1444522
# [3,] 2619235 1681640
到那个时候,也许
data = as.data.frame(data)
names(data) = c("x", "y")
# x y
# 1 2622731 1387660
# 2 2621628 1444522
# 3 2619235 1681640
在 Stata 中,公认的 R 解决方案的类似物可能涉及 split
和 reshape long
。这是另一种方法:
* data example
clear
set obs 1
gen strL data = "2622731,1387660,2621628,1444522,2619235,1681640"
* code for data example
replace data = subinstr(data, ",", " ", .)
set obs `=wordcount(data)/2'
gen x = real(word(data[1], 2 * _n - 1))
gen y = real(word(data[1], 2 * _n))
list
+---------------------------------------------------------------------+
| data x y |
|---------------------------------------------------------------------|
1. | 2622731 1387660 2621628 1444522 2619235 1681640 2622731 1387660 |
2. | 2621628 1444522 |
3. | 2619235 1681640 |
+---------------------------------------------------------------------+
使用 scan
并使用 matrix
重塑:
s <- "2622731,1387660,2621628,1444522,2619235,1681640" # test data
matrix(scan(text = s, sep = ",", quiet = TRUE), ncol = 2, byrow = TRUE)
## [,1] [,2]
## [1,] 2622731 1387660
## [2,] 2621628 1444522
## [3,] 2619235 1681640