R:无论我使用什么阅读功能,csv 数据都以一列结尾
R: csv data ends in one column no matter what reading function I use
没想到会在这里问到我的第一个关于数据读取的问题,但是它来了。
我有来自 brain-imaging 机器的 "ugly" 数据文件,这些文件是 csv 格式的(基本上是原始数据 + 顶部几行的 header/description)。
我想使用 R 进行一些简单的子集化。首先,当我打开 Excel 中的数据时,它看起来一团糟(全部在一列中),但是当我在 Excel 中使用读取函数时(数据,从 csv 读取)它看起来正常(excel 告诉我它使用逗号作为分隔符)。
我尝试在 R 中使用多个读取函数(甚至尝试不同的分隔符选项),但结果总是出现在一个又长又丑的列中。
我可以通过将数据保存在 Excel 中来绕过它(在首先将它们放入死机之后),但是它改变了分隔符并且使我无法在之后将数据提供给其他一些自定义分析软件。
因此,我需要数据保持相同的格式(并且只剪切原始数据的某些部分)并继续。
谢谢。
编辑:我正在添加 link 文件下载。我脑子里只有几个测试测量条目(之前有大约 16000 个条目,但我删除了其中的大部分以保持简短和私密)。 You can download the csv here
Edit2:已解决。我意识到这可能是由函数处理本机 header 的方式引起的。可以通过在脚本中使用 col.names 分配 header(列)名称来避免这种情况。愚蠢的错误,还有很多东西要学:)。
编辑 - 数据如下所示。第 1-39 行是 header,数据从第 40 行开始,只有字符串 'Data'.
Header
File Version,1.21
Patient Information
Comment,,,
Birth Date,0000/00/00
Age, 0y
Sex,Male
Analyze Information
AnalyzeMode,Continuous
Pre Time[s],9.0
Post Time[s],7.0
Recovery Time[s],12.0
Base Time[s],5
Fitting Degree,1
HPF[Hz],No Filter
LPF[Hz],No Filter
Moving Average[s],0.1
Measure Information
Probe Type,adult
Mode,3x3
Wave[nm],695,830
Wave Length,CH1(700.0),CH1(830.8),CH2(698.3),CH2(828.4),CH3(700.0),CH3(830.8),CH4(698.9),CH4(827.8),CH5(698.3),CH5(828.4),CH6(698.9),CH6(827.8),CH7(698.9),CH7(827.8),CH8(699.2),CH8(830.0),CH9(698.9),CH9(827.8),CH10(703.7),CH10(828.2),CH11(699.2),CH11(830.0),CH12(703.7),CH12(828.2),CH13(700.2),CH13(831.2),CH14(701.4),CH14(828.2),CH15(700.2),CH15(831.2),CH16(699.7),CH16(830.7),CH17(701.4),CH17(828.2),CH18(699.7),CH18(830.7),CH19(699.7),CH19(830.7),CH20(698.1),CH20(831.0),CH21(699.7),CH21(830.7),CH22(698.3),CH22(830.6),CH23(698.1),CH23(831.0),CH24(698.3),CH24(830.6)
Analog Gain,59.60784300,59.60784300,59.60784300,59.60784300,600.00000000,600.00000000,59.60784300,59.60784300,242.35294100,242.35294100,600.00000000,600.00000000,242.35294100,242.35294100,600.00000000,600.00000000,235.29411800,235.29411800,242.35294100,242.35294100,235.29411800,235.29411800,235.29411800,235.29411800,31.29411800,31.29411800,31.29411800,31.29411800,284.70588200,284.70588200,31.29411800,31.29411800,200.00000000,200.00000000,284.70588200,284.70588200,200.00000000,200.00000000,284.70588200,284.70588200,75.29411800,75.29411800,200.00000000,200.00000000,75.29411800,75.29411800,75.29411800,75.29411800
Digital Gain,21.76000000,5.61000000,28.46000000,5.76000000,15.21000000,4.46000000,100.00000000,32.86000000,25.72000000,4.09000000,19.43000000,4.42000000,17.86000000,3.81000000,32.66000000,10.99000000,100.00000000,21.47000000,25.16000000,5.03000000,14.26000000,3.62000000,13.59000000,3.06000000,12.91000000,3.42000000,32.35000000,6.98000000,9.66000000,2.11000000,100.00000000,44.11000000,12.41000000,4.17000000,49.73000000,7.74000000,12.48000000,2.81000000,36.93000000,6.75000000,100.00000000,100.00000000,65.63000000,17.46000000,100.00000000,100.00000000,100.00000000,100.00000000
Sampling Period[s],0.1
StimType,EVENT
Stim Time[s]
F1,15,F2,15,F3,15,F4,15,F5,15,F6,15,F7,15,F8,15,F9,15,M,15
Repeat Count,3
Exception Ch,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Data
Probe1(Total),CH1,CH2,CH3,CH4,CH5,CH6,CH7,CH8,CH9,CH10,CH11,CH12,CH13,CH14,CH15,CH16,CH17,CH18,CH19,CH20,CH21,CH22,CH23,CH24,Mark,Time,BodyMovement,RemovalMark,PreScan
1,0.01750125,-0.00731065,-0.0229914,-0.01572692,0.04387726,-0.05805205,0.01678475,0.12706034,0.15895581,0.11640126,0.07686448,0.05669941,0.02798176,0.00731046,0.04893643,-0.03621271,0.04504761,0.02515063,0.02962047,0.03181091,-5.10545969,-0.05849782,-2.40030622,42.36789703,0,13:31:07.95,0,0,0
2,-0.01928233,-0.00760579,-0.04548376,0.14481309,-0.02861471,-0.0563355,-0.0301471,-0.07790314,0.0972455,0.08708155,0.02634541,0.03705737,0.00716472,-0.01115488,0.02829455,0.09065069,0.01211305,0.02277327,0.02067163,0.04387939,2.23265266,-0.0207526,-3.71421456,36.16513062,0,13:31:08.06,0,0,0
3,-0.03335796,-0.02295596,-0.07686513,0.01852697,-0.07737321,-0.13283072,-0.10482638,-0.06564587,-0.03146207,0.01997223,-0.02385158,-0.01002161,-0.026086,-0.03784704,-0.02308246,-0.08081956,-0.00761192,0.01914825,-0.00956007,0.00668299,2.47277832,-0.10391754,3.8903904,6.78369522,0,13:31:08.15,0,0,0
以下是仅编辑数据的 'Data' 部分的方法。
library(stringr)
library(data.table)
fileName <- 'test_probes.csv'
inString <- readChar(fileName, file.info(fileName)$size)
header <- str_split(inString, 'Data')[[1]][1]
d <- fread(str_split(inString, 'Data')[[1]][2])
## do stuff to d here
# write output to out.csv
write(header, file = 'out.csv')
write('Data', file = 'out.csv', append=T)
write.table(d, file='out.csv', append=T, quote=F, row.names=F, sep=',', col.names=T)
没想到会在这里问到我的第一个关于数据读取的问题,但是它来了。 我有来自 brain-imaging 机器的 "ugly" 数据文件,这些文件是 csv 格式的(基本上是原始数据 + 顶部几行的 header/description)。
我想使用 R 进行一些简单的子集化。首先,当我打开 Excel 中的数据时,它看起来一团糟(全部在一列中),但是当我在 Excel 中使用读取函数时(数据,从 csv 读取)它看起来正常(excel 告诉我它使用逗号作为分隔符)。
我尝试在 R 中使用多个读取函数(甚至尝试不同的分隔符选项),但结果总是出现在一个又长又丑的列中。 我可以通过将数据保存在 Excel 中来绕过它(在首先将它们放入死机之后),但是它改变了分隔符并且使我无法在之后将数据提供给其他一些自定义分析软件。
因此,我需要数据保持相同的格式(并且只剪切原始数据的某些部分)并继续。
谢谢。
编辑:我正在添加 link 文件下载。我脑子里只有几个测试测量条目(之前有大约 16000 个条目,但我删除了其中的大部分以保持简短和私密)。 You can download the csv here
Edit2:已解决。我意识到这可能是由函数处理本机 header 的方式引起的。可以通过在脚本中使用 col.names 分配 header(列)名称来避免这种情况。愚蠢的错误,还有很多东西要学:)。
编辑 - 数据如下所示。第 1-39 行是 header,数据从第 40 行开始,只有字符串 'Data'.
Header
File Version,1.21
Patient Information
Comment,,,
Birth Date,0000/00/00
Age, 0y
Sex,Male
Analyze Information
AnalyzeMode,Continuous
Pre Time[s],9.0
Post Time[s],7.0
Recovery Time[s],12.0
Base Time[s],5
Fitting Degree,1
HPF[Hz],No Filter
LPF[Hz],No Filter
Moving Average[s],0.1
Measure Information
Probe Type,adult
Mode,3x3
Wave[nm],695,830
Wave Length,CH1(700.0),CH1(830.8),CH2(698.3),CH2(828.4),CH3(700.0),CH3(830.8),CH4(698.9),CH4(827.8),CH5(698.3),CH5(828.4),CH6(698.9),CH6(827.8),CH7(698.9),CH7(827.8),CH8(699.2),CH8(830.0),CH9(698.9),CH9(827.8),CH10(703.7),CH10(828.2),CH11(699.2),CH11(830.0),CH12(703.7),CH12(828.2),CH13(700.2),CH13(831.2),CH14(701.4),CH14(828.2),CH15(700.2),CH15(831.2),CH16(699.7),CH16(830.7),CH17(701.4),CH17(828.2),CH18(699.7),CH18(830.7),CH19(699.7),CH19(830.7),CH20(698.1),CH20(831.0),CH21(699.7),CH21(830.7),CH22(698.3),CH22(830.6),CH23(698.1),CH23(831.0),CH24(698.3),CH24(830.6)
Analog Gain,59.60784300,59.60784300,59.60784300,59.60784300,600.00000000,600.00000000,59.60784300,59.60784300,242.35294100,242.35294100,600.00000000,600.00000000,242.35294100,242.35294100,600.00000000,600.00000000,235.29411800,235.29411800,242.35294100,242.35294100,235.29411800,235.29411800,235.29411800,235.29411800,31.29411800,31.29411800,31.29411800,31.29411800,284.70588200,284.70588200,31.29411800,31.29411800,200.00000000,200.00000000,284.70588200,284.70588200,200.00000000,200.00000000,284.70588200,284.70588200,75.29411800,75.29411800,200.00000000,200.00000000,75.29411800,75.29411800,75.29411800,75.29411800
Digital Gain,21.76000000,5.61000000,28.46000000,5.76000000,15.21000000,4.46000000,100.00000000,32.86000000,25.72000000,4.09000000,19.43000000,4.42000000,17.86000000,3.81000000,32.66000000,10.99000000,100.00000000,21.47000000,25.16000000,5.03000000,14.26000000,3.62000000,13.59000000,3.06000000,12.91000000,3.42000000,32.35000000,6.98000000,9.66000000,2.11000000,100.00000000,44.11000000,12.41000000,4.17000000,49.73000000,7.74000000,12.48000000,2.81000000,36.93000000,6.75000000,100.00000000,100.00000000,65.63000000,17.46000000,100.00000000,100.00000000,100.00000000,100.00000000
Sampling Period[s],0.1
StimType,EVENT
Stim Time[s]
F1,15,F2,15,F3,15,F4,15,F5,15,F6,15,F7,15,F8,15,F9,15,M,15
Repeat Count,3
Exception Ch,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Data
Probe1(Total),CH1,CH2,CH3,CH4,CH5,CH6,CH7,CH8,CH9,CH10,CH11,CH12,CH13,CH14,CH15,CH16,CH17,CH18,CH19,CH20,CH21,CH22,CH23,CH24,Mark,Time,BodyMovement,RemovalMark,PreScan
1,0.01750125,-0.00731065,-0.0229914,-0.01572692,0.04387726,-0.05805205,0.01678475,0.12706034,0.15895581,0.11640126,0.07686448,0.05669941,0.02798176,0.00731046,0.04893643,-0.03621271,0.04504761,0.02515063,0.02962047,0.03181091,-5.10545969,-0.05849782,-2.40030622,42.36789703,0,13:31:07.95,0,0,0
2,-0.01928233,-0.00760579,-0.04548376,0.14481309,-0.02861471,-0.0563355,-0.0301471,-0.07790314,0.0972455,0.08708155,0.02634541,0.03705737,0.00716472,-0.01115488,0.02829455,0.09065069,0.01211305,0.02277327,0.02067163,0.04387939,2.23265266,-0.0207526,-3.71421456,36.16513062,0,13:31:08.06,0,0,0
3,-0.03335796,-0.02295596,-0.07686513,0.01852697,-0.07737321,-0.13283072,-0.10482638,-0.06564587,-0.03146207,0.01997223,-0.02385158,-0.01002161,-0.026086,-0.03784704,-0.02308246,-0.08081956,-0.00761192,0.01914825,-0.00956007,0.00668299,2.47277832,-0.10391754,3.8903904,6.78369522,0,13:31:08.15,0,0,0
以下是仅编辑数据的 'Data' 部分的方法。
library(stringr)
library(data.table)
fileName <- 'test_probes.csv'
inString <- readChar(fileName, file.info(fileName)$size)
header <- str_split(inString, 'Data')[[1]][1]
d <- fread(str_split(inString, 'Data')[[1]][2])
## do stuff to d here
# write output to out.csv
write(header, file = 'out.csv')
write('Data', file = 'out.csv', append=T)
write.table(d, file='out.csv', append=T, quote=F, row.names=F, sep=',', col.names=T)