如何删除其中一个值满足条件的所有行?申请()不工作

How can I remove all rows in which one of the values satisfies a condition? apply() not working

我有一个包含两列的数据框,我想删除每行中的一个值小于 0 或大于指定数字的所有行(为了论证,我们称其为 2000 ).

这是数据框

structure(list(xx = c(134.697838289433, 222.004361198059, 131.230956160172, 
206.658871436917, 111.25078650042, 241.965831417648, 171.46912254679, 
116.860666678254, 196.894985820028, 135.309699618638, 133.082437475133, 
185.509376072318, 718.998297748551, 745.902984215293, 752.655615982603, 
633.199684348903, 764.983924278636, 694.856525559398, 773.56532078895, 
757.32358575657, 709.924023536199, 658.863564702233, 733.076690816291, 
745.9306541374, 788.134444412421, 759.445624288787, 796.989170170713, 
632.952543475636, 746.103571612919, 715.296116988119, 766.899107551248, 
628.268453830605, 658.574104878488, 689.916530654021, 820.841422812349, 
709.097957368612, 793.109262845978, 716.713801941779, 726.83260343463, 
746.547080776193, 759.644057119419, 757.41275593749, 723.539527360327, 
839.816318612061, 795.655016954661, 766.245386324182, 756.300015395758, 
808.255074043333, 745.915083305187, 685.465492956583, 694.567959198318, 
786.919467838804, 699.521900871042, 749.041223560884, 700.079697765533, 
753.805501259023, 745.080253997501, 846.982894686656, 775.66384433188, 
809.39649823454, 841.009469183585, 790.987061753069, 792.441925234251, 
1377.97739642236, 1353.19738061511, 1259.94435540633, 1276.25060187203, 
1331.26106031956, 1227.68481147557, 1345.95561236514, 1309.51489973952, 
1285.62680259649, 1329.46388049714, 1256.00394500077, 1294.0505313591, 
1349.09440181876, 1294.72661682462, 1339.38577920408, 1277.114896541, 
1267.54884404031, 1291.32793111573, 1254.85565551553, 1298.78499697743, 
1283.89664572036, 1273.92831816666, 1310.221891323, 1327.89682404014, 
1310.81394400863, 595.342571560588, 689.892254230306, 562.390766853428, 
736.319251501976, 609.577261412134, 641.591997384705, 682.957658696869, 
580.320759093636, 560.64984978551, 643.487033739876, 688.457314818318, 
631.156743281308, 659.535909106305), yy = c(1169.70954243065, 
1259.830208937, 1172.21661417439, 1097.62724268622, 1198.15024522658, 
1231.90665701131, 1211.36196331211, 1152.4207367321, 1287.57553021171, 
1120.61366993258, 1234.70366243878, 1258.47454705197, 893.983957068268, 
994.99854601335, 916.330965835536, 947.536265806389, 950.345051732045, 
934.313361799171, 1018.76942964176, 918.182358835366, 1005.51128858608, 
967.577307930044, 997.239384198691, 995.866808447868, 962.292293255127, 
864.624084608006, 895.091604672023, 906.22162647536, 1024.45206885923, 
908.693026118345, 923.625774785301, 931.801569764776, 1007.88553380827, 
848.55309782664, 927.608364899483, 1024.60765786828, 1085.64295260059, 
1057.90632135992, 1195.30607038065, 1151.39888340311, 1168.2831257626, 
1137.15375447446, 1145.42393212912, 1108.89072769468, 1075.15451622384, 
1129.91711324634, 1191.94330388541, 1132.41649984784, 1210.89342724886, 
1100.60339252755, 1083.5987922884, 1056.69487941162, 1150.2707936581, 
1055.75678264632, 1055.53323667429, 1049.79655119467, 1166.86598024805, 
1141.82593378866, 1066.37755267981, 1160.55793904653, 1162.65728735716, 
1060.29360609309, 1107.40480300404, 1825.01445883899, 1802.95011068891, 
1692.84948509132, 1675.97166713074, 1758.10341887143, 1788.48414279738, 
1680.15824054313, 1756.01930833023, 1706.98458587119, 1770.57687329296, 
1692.21991398915, 1835.60585163662, 1790.6487914694, 1787.52076839767, 
1704.25313427813, 1735.96312434652, 1813.02044772293, 1847.21159474717, 
1725.63580525853, 1841.32016678, 1713.80845602987, 1770.39756152819, 
1747.72988313376, 1778.13110060636, 1786.3871288087, 6.01666671271317, 
19.2497357431764, 9.6964112500295, -3.23929433528044, 89.4863211231715, 
86.0082947221296, 42.7982120490919, 2.19886414532234, 12.8780844043502, 
30.694893442471, 7.58386594976601, 83.8385161493349, 36.4551491976192
)), row.names = 100:200, class = "data.frame")

首先我创建了一个函数来消除满足条件的点。

routliers<-function(x){
  if(x>2000|x<0){
    rm(x)
  }
}

然后我使用apply函数跨行消除使用上述函数的点(上面的dput()被命名为cds)

cds<-data.frame(apply(cds,1,routliers))

但这消除了所有点

length(cds)
[1]0

有趣的是,如果我用 print() 替换 rm() 函数,那么我在使用 apply 函数时打印出所需的点,但我收到错误“arguments imply different number of rows: 0, 2 ”。此外,我不确定何时使用 apply() 函数指定的函数适用于两列数据,因为我在 print() 中没有看到任何满足仅第二列点条件的数据点。第一列是 x 坐标,第二列是 y 坐标。我认为错误“arguments imply different number of rows:0,2”表明只有行中的第一个值正在针对函数进行测试。

如果一个或多个数据点满足我的条件,我该如何编写代码来消除行?

当列是单独的向量 (x<-x[!condition]) 时,这很容易做到,但是我不能轻易地将它们再次加在一起,所以我更喜欢在点的数据帧上这样做。

让我们为离群值设置函数 return TRUE,为非离群值设置 FALSE。并且可以向量化:

is_outlier = function(x) {
  x > 2000 | x < 0
}

以下是我们如何使用它来删除单列中具有异常值的行:

cds[!is_outlier(cds$xx), ]

对于两列,我们可以将 is_outlier 结果与 &| 合并。我无法从您的文本中判断您是要删除 xx AND yy 异常值的行,还是删除 xx OR yy 异常值的行。所以选择合适的版本:

cds[!is_outlier(cds$xx) & !is_outlier(cds$yy), ]
cds[!is_outlier(cds$xx) | !is_outlier(cds$yy), ]

请检查此代码是否适合您,df 是您共享的数据:

#Code
new <- df[!rowSums(df < 0 | df>2000) > 0, ]

或者这样:

#Code 2
new <- df[which(apply(df,1,function(x) sum(x<0 | x>2000))==0),]