对特定列的 data.table 行应用 savitzky golay 过滤的有效方法?
Efficient way of applying a savitzky golay filtering on data.table rows for certain column?
我编写了一个函数,将 savitzky golay 过滤器应用于 data.table 中的每一行。具有测量值的第一列作为参数给出,所有后面的列也包含要过滤的测量值。已处理的行就地更新。
我的功能有效,但速度很慢。
如何更改功能才能使工作更高效、更 data.table 喜欢?
MWE:
library(data.table)
library(pracma)
library(datasets)
data(iris)
setDT(iris)
#Reorder columns because function expects columns to apply a filter on
#starting from a defined column to the last column
setcolorder(iris, "Species")
savitzky_golay <- function(dt, id_of_first_sample_col=2, win_size=5) {
c_names_samples <- colnames(dt)[id_of_first_sample_col:ncol(dt)]
for (i in seq(from=1, to=nrow(dt))) {
mat <- as.numeric(dt[i,id_of_first_sample_col:ncol(dt)]) # Get sample data as matrix (one row)
mat <- savgol(mat,fl=win_size,forder=2,dorder=0) # Savitzky-Golay-Filter
dt[i, (c_names_samples) := as.list(mat)] # Update columns of current row by reference
}
# Returns nothing as update is done via reference.
}
savitzky_golay(iris)
尝试:
savitzky_golay_new <- function(dt, id_of_first_sample_col=2, win_size=5) {
c_names_samples <- colnames(dt)[id_of_first_sample_col:ncol(dt)]
dt[,(c_names_samples):=asplit(apply(.SD,1,function(x) savgol(x,fl=win_size,forder=2,dorder=0)),1)
,.SDcols=c_names_samples]
}
性能比较:
microbenchmark::microbenchmark(savitzky_golay_new(dt2),savitzky_golay(dt1))
Unit: milliseconds
expr min lq mean median uq max neval
savitzky_golay_new(dt2) 12.7808 13.69695 15.63821 14.31785 15.17705 31.2701 100
savitzky_golay(dt1) 71.4231 81.96115 87.97737 86.41265 90.42620 239.7945 100
我编写了一个函数,将 savitzky golay 过滤器应用于 data.table 中的每一行。具有测量值的第一列作为参数给出,所有后面的列也包含要过滤的测量值。已处理的行就地更新。
我的功能有效,但速度很慢。
如何更改功能才能使工作更高效、更 data.table 喜欢?
MWE:
library(data.table)
library(pracma)
library(datasets)
data(iris)
setDT(iris)
#Reorder columns because function expects columns to apply a filter on
#starting from a defined column to the last column
setcolorder(iris, "Species")
savitzky_golay <- function(dt, id_of_first_sample_col=2, win_size=5) {
c_names_samples <- colnames(dt)[id_of_first_sample_col:ncol(dt)]
for (i in seq(from=1, to=nrow(dt))) {
mat <- as.numeric(dt[i,id_of_first_sample_col:ncol(dt)]) # Get sample data as matrix (one row)
mat <- savgol(mat,fl=win_size,forder=2,dorder=0) # Savitzky-Golay-Filter
dt[i, (c_names_samples) := as.list(mat)] # Update columns of current row by reference
}
# Returns nothing as update is done via reference.
}
savitzky_golay(iris)
尝试:
savitzky_golay_new <- function(dt, id_of_first_sample_col=2, win_size=5) {
c_names_samples <- colnames(dt)[id_of_first_sample_col:ncol(dt)]
dt[,(c_names_samples):=asplit(apply(.SD,1,function(x) savgol(x,fl=win_size,forder=2,dorder=0)),1)
,.SDcols=c_names_samples]
}
性能比较:
microbenchmark::microbenchmark(savitzky_golay_new(dt2),savitzky_golay(dt1))
Unit: milliseconds
expr min lq mean median uq max neval
savitzky_golay_new(dt2) 12.7808 13.69695 15.63821 14.31785 15.17705 31.2701 100
savitzky_golay(dt1) 71.4231 81.96115 87.97737 86.41265 90.42620 239.7945 100