如何直接从 R 中的 csv 行范围获取平均值?
How to get mean directly from a csv row range in R?
以下是我读取超过1000个csv文件的代码,其中每个文件有超过1000行和4列。每个csv文件只有4列,如ID、values、param1、param2。我当前的代码片段将每个文件及其各自的文件名分别读取到数据框中。它本身是非常干净的框架。因为已经实现了,所以我只是在寻找一个可以集成到我的功能中的代码。
例如输入
200 4.864 ne15 hx1
201 4.872 ne12 hx3
202 4.898 ne10 hx9
203 4.815 ne23 hx1
204 4.699 ne14 hx3
...
212 4.813 ne20 hx2
213 4.763 ne18 hx8
...
输出:
例如
index row#. value filename
# mean should be the value for row 2 to 20
# it needs to be output in R under row 202
154 202.0 4.337 1wq.csv
164 225.0 4.358 1wq.csv
174 250.0 4.421 1wq.csv
184 275.0 4.498 1wq.csv
194 300.0 4.513 1wq.csv
我不想从该列的 csv 文件行(18 个值)中获取 2 到 20 个连续值,而是获取与第 2 行到第 20 行中的值对应的平均值。我该怎么做?
#set working directly to the folder where csv files are located
files <- list.files(pattern='.csv')
m = data.frame()
for (k in 1:length(files)){
csv = read.csv(files[k], header = FALSE)
#picking up 2:20 consecutive values, value for row 50,120,150 so on
data = csv[c(2:20, 50, 120, 150, 175, 200), c(1,2)]
#-pivot transform col/row- data <- as.data.frame(t(data))
#but that line screwed up the data
#when those selected values are with NA/blanks
data$file = files[k]
m = rbind(m, data)
}
感谢这两个答案,我完成了以下工作:
我将再次单独尝试 AdamQuek 的回答以改进我的回答。
现在,我将关闭这个已解决的问题。
m = data.frame()
for (k in 1:length(files)) {
csv = read.csv(files[k], header = FALSE)
data = csv[c(2:20, 225, 250, 275, 300, 325, 350), c(1,2)]
data[1,] <- mean(data[c(2:19),c(2)], na.rm=T)
data <- data[-2:-19,]
data[c(1),c(1)] = 200
data$file = files[k]
data <- as.data.frame(t(data))
m = rbind(m, data)
}
files <- list.files(pattern='\.csv')
all <- lapply(files, read.csv, header=FALSE)
all.subset <- lapply(all, function(x)x[c(2:20, 50, 120, 150, 175, 200), c(1,2)])
col.means <- function(x) colMeans(x, na.rm=T)
do.call(rbind, lapply(all.subset, col.means))
编辑:
files <- list.files(pattern='\.csv')
m <- data.frame()
for (k in files){
csv <- read.csv(k, header = FALSE)[, c(1,2)]
v1 <- mean(csv[2:20,1], na.rm=T)
v2 <- mean(csv[2:20,2], na.rm=T)
mean.val <- data.frame(v1=v1, v2=v2, file=k)
subset.data <- csv[c(50, 120, 150, 175, 200),]
subset.data <- rbind(mean.val, subset.data)
m <- rbind(m, subset.data)
}
这是您想要完成的事情吗?
# Insert the mean of rows 2:20 into row 202
csv[202,"value"] <- mean(csv[2:20,"value])
# Drop rows 2:20 from the dataframe
csv <- csv[-2:-20,]
以下是我读取超过1000个csv文件的代码,其中每个文件有超过1000行和4列。每个csv文件只有4列,如ID、values、param1、param2。我当前的代码片段将每个文件及其各自的文件名分别读取到数据框中。它本身是非常干净的框架。因为已经实现了,所以我只是在寻找一个可以集成到我的功能中的代码。
例如输入
200 4.864 ne15 hx1
201 4.872 ne12 hx3
202 4.898 ne10 hx9
203 4.815 ne23 hx1
204 4.699 ne14 hx3
...
212 4.813 ne20 hx2
213 4.763 ne18 hx8
...
输出: 例如
index row#. value filename
# mean should be the value for row 2 to 20
# it needs to be output in R under row 202
154 202.0 4.337 1wq.csv
164 225.0 4.358 1wq.csv
174 250.0 4.421 1wq.csv
184 275.0 4.498 1wq.csv
194 300.0 4.513 1wq.csv
我不想从该列的 csv 文件行(18 个值)中获取 2 到 20 个连续值,而是获取与第 2 行到第 20 行中的值对应的平均值。我该怎么做?
#set working directly to the folder where csv files are located
files <- list.files(pattern='.csv')
m = data.frame()
for (k in 1:length(files)){
csv = read.csv(files[k], header = FALSE)
#picking up 2:20 consecutive values, value for row 50,120,150 so on
data = csv[c(2:20, 50, 120, 150, 175, 200), c(1,2)]
#-pivot transform col/row- data <- as.data.frame(t(data))
#but that line screwed up the data
#when those selected values are with NA/blanks
data$file = files[k]
m = rbind(m, data)
}
感谢这两个答案,我完成了以下工作: 我将再次单独尝试 AdamQuek 的回答以改进我的回答。 现在,我将关闭这个已解决的问题。
m = data.frame()
for (k in 1:length(files)) {
csv = read.csv(files[k], header = FALSE)
data = csv[c(2:20, 225, 250, 275, 300, 325, 350), c(1,2)]
data[1,] <- mean(data[c(2:19),c(2)], na.rm=T)
data <- data[-2:-19,]
data[c(1),c(1)] = 200
data$file = files[k]
data <- as.data.frame(t(data))
m = rbind(m, data)
}
files <- list.files(pattern='\.csv')
all <- lapply(files, read.csv, header=FALSE)
all.subset <- lapply(all, function(x)x[c(2:20, 50, 120, 150, 175, 200), c(1,2)])
col.means <- function(x) colMeans(x, na.rm=T)
do.call(rbind, lapply(all.subset, col.means))
编辑:
files <- list.files(pattern='\.csv')
m <- data.frame()
for (k in files){
csv <- read.csv(k, header = FALSE)[, c(1,2)]
v1 <- mean(csv[2:20,1], na.rm=T)
v2 <- mean(csv[2:20,2], na.rm=T)
mean.val <- data.frame(v1=v1, v2=v2, file=k)
subset.data <- csv[c(50, 120, 150, 175, 200),]
subset.data <- rbind(mean.val, subset.data)
m <- rbind(m, subset.data)
}
这是您想要完成的事情吗?
# Insert the mean of rows 2:20 into row 202
csv[202,"value"] <- mean(csv[2:20,"value])
# Drop rows 2:20 from the dataframe
csv <- csv[-2:-20,]