如何删除其中一个值满足条件的所有行?申请()不工作
How can I remove all rows in which one of the values satisfies a condition? apply() not working
我有一个包含两列的数据框,我想删除每行中的一个值小于 0 或大于指定数字的所有行(为了论证,我们称其为 2000 ).
这是数据框
structure(list(xx = c(134.697838289433, 222.004361198059, 131.230956160172,
206.658871436917, 111.25078650042, 241.965831417648, 171.46912254679,
116.860666678254, 196.894985820028, 135.309699618638, 133.082437475133,
185.509376072318, 718.998297748551, 745.902984215293, 752.655615982603,
633.199684348903, 764.983924278636, 694.856525559398, 773.56532078895,
757.32358575657, 709.924023536199, 658.863564702233, 733.076690816291,
745.9306541374, 788.134444412421, 759.445624288787, 796.989170170713,
632.952543475636, 746.103571612919, 715.296116988119, 766.899107551248,
628.268453830605, 658.574104878488, 689.916530654021, 820.841422812349,
709.097957368612, 793.109262845978, 716.713801941779, 726.83260343463,
746.547080776193, 759.644057119419, 757.41275593749, 723.539527360327,
839.816318612061, 795.655016954661, 766.245386324182, 756.300015395758,
808.255074043333, 745.915083305187, 685.465492956583, 694.567959198318,
786.919467838804, 699.521900871042, 749.041223560884, 700.079697765533,
753.805501259023, 745.080253997501, 846.982894686656, 775.66384433188,
809.39649823454, 841.009469183585, 790.987061753069, 792.441925234251,
1377.97739642236, 1353.19738061511, 1259.94435540633, 1276.25060187203,
1331.26106031956, 1227.68481147557, 1345.95561236514, 1309.51489973952,
1285.62680259649, 1329.46388049714, 1256.00394500077, 1294.0505313591,
1349.09440181876, 1294.72661682462, 1339.38577920408, 1277.114896541,
1267.54884404031, 1291.32793111573, 1254.85565551553, 1298.78499697743,
1283.89664572036, 1273.92831816666, 1310.221891323, 1327.89682404014,
1310.81394400863, 595.342571560588, 689.892254230306, 562.390766853428,
736.319251501976, 609.577261412134, 641.591997384705, 682.957658696869,
580.320759093636, 560.64984978551, 643.487033739876, 688.457314818318,
631.156743281308, 659.535909106305), yy = c(1169.70954243065,
1259.830208937, 1172.21661417439, 1097.62724268622, 1198.15024522658,
1231.90665701131, 1211.36196331211, 1152.4207367321, 1287.57553021171,
1120.61366993258, 1234.70366243878, 1258.47454705197, 893.983957068268,
994.99854601335, 916.330965835536, 947.536265806389, 950.345051732045,
934.313361799171, 1018.76942964176, 918.182358835366, 1005.51128858608,
967.577307930044, 997.239384198691, 995.866808447868, 962.292293255127,
864.624084608006, 895.091604672023, 906.22162647536, 1024.45206885923,
908.693026118345, 923.625774785301, 931.801569764776, 1007.88553380827,
848.55309782664, 927.608364899483, 1024.60765786828, 1085.64295260059,
1057.90632135992, 1195.30607038065, 1151.39888340311, 1168.2831257626,
1137.15375447446, 1145.42393212912, 1108.89072769468, 1075.15451622384,
1129.91711324634, 1191.94330388541, 1132.41649984784, 1210.89342724886,
1100.60339252755, 1083.5987922884, 1056.69487941162, 1150.2707936581,
1055.75678264632, 1055.53323667429, 1049.79655119467, 1166.86598024805,
1141.82593378866, 1066.37755267981, 1160.55793904653, 1162.65728735716,
1060.29360609309, 1107.40480300404, 1825.01445883899, 1802.95011068891,
1692.84948509132, 1675.97166713074, 1758.10341887143, 1788.48414279738,
1680.15824054313, 1756.01930833023, 1706.98458587119, 1770.57687329296,
1692.21991398915, 1835.60585163662, 1790.6487914694, 1787.52076839767,
1704.25313427813, 1735.96312434652, 1813.02044772293, 1847.21159474717,
1725.63580525853, 1841.32016678, 1713.80845602987, 1770.39756152819,
1747.72988313376, 1778.13110060636, 1786.3871288087, 6.01666671271317,
19.2497357431764, 9.6964112500295, -3.23929433528044, 89.4863211231715,
86.0082947221296, 42.7982120490919, 2.19886414532234, 12.8780844043502,
30.694893442471, 7.58386594976601, 83.8385161493349, 36.4551491976192
)), row.names = 100:200, class = "data.frame")
首先我创建了一个函数来消除满足条件的点。
routliers<-function(x){
if(x>2000|x<0){
rm(x)
}
}
然后我使用apply函数跨行消除使用上述函数的点(上面的dput()被命名为cds)
cds<-data.frame(apply(cds,1,routliers))
但这消除了所有点
length(cds)
[1]0
有趣的是,如果我用 print() 替换 rm() 函数,那么我在使用 apply 函数时打印出所需的点,但我收到错误“arguments imply different number of rows: 0, 2 ”。此外,我不确定何时使用 apply() 函数指定的函数适用于两列数据,因为我在 print() 中没有看到任何满足仅第二列点条件的数据点。第一列是 x 坐标,第二列是 y 坐标。我认为错误“arguments imply different number of rows:0,2”表明只有行中的第一个值正在针对函数进行测试。
如果一个或多个数据点满足我的条件,我该如何编写代码来消除行?
当列是单独的向量 (x<-x[!condition]) 时,这很容易做到,但是我不能轻易地将它们再次加在一起,所以我更喜欢在点的数据帧上这样做。
让我们为离群值设置函数 return TRUE
,为非离群值设置 FALSE
。并且可以向量化:
is_outlier = function(x) {
x > 2000 | x < 0
}
以下是我们如何使用它来删除单列中具有异常值的行:
cds[!is_outlier(cds$xx), ]
对于两列,我们可以将 is_outlier
结果与 &
或 |
合并。我无法从您的文本中判断您是要删除 xx
AND yy
异常值的行,还是删除 xx
OR yy
异常值的行。所以选择合适的版本:
cds[!is_outlier(cds$xx) & !is_outlier(cds$yy), ]
cds[!is_outlier(cds$xx) | !is_outlier(cds$yy), ]
请检查此代码是否适合您,df
是您共享的数据:
#Code
new <- df[!rowSums(df < 0 | df>2000) > 0, ]
或者这样:
#Code 2
new <- df[which(apply(df,1,function(x) sum(x<0 | x>2000))==0),]
我有一个包含两列的数据框,我想删除每行中的一个值小于 0 或大于指定数字的所有行(为了论证,我们称其为 2000 ).
这是数据框
structure(list(xx = c(134.697838289433, 222.004361198059, 131.230956160172,
206.658871436917, 111.25078650042, 241.965831417648, 171.46912254679,
116.860666678254, 196.894985820028, 135.309699618638, 133.082437475133,
185.509376072318, 718.998297748551, 745.902984215293, 752.655615982603,
633.199684348903, 764.983924278636, 694.856525559398, 773.56532078895,
757.32358575657, 709.924023536199, 658.863564702233, 733.076690816291,
745.9306541374, 788.134444412421, 759.445624288787, 796.989170170713,
632.952543475636, 746.103571612919, 715.296116988119, 766.899107551248,
628.268453830605, 658.574104878488, 689.916530654021, 820.841422812349,
709.097957368612, 793.109262845978, 716.713801941779, 726.83260343463,
746.547080776193, 759.644057119419, 757.41275593749, 723.539527360327,
839.816318612061, 795.655016954661, 766.245386324182, 756.300015395758,
808.255074043333, 745.915083305187, 685.465492956583, 694.567959198318,
786.919467838804, 699.521900871042, 749.041223560884, 700.079697765533,
753.805501259023, 745.080253997501, 846.982894686656, 775.66384433188,
809.39649823454, 841.009469183585, 790.987061753069, 792.441925234251,
1377.97739642236, 1353.19738061511, 1259.94435540633, 1276.25060187203,
1331.26106031956, 1227.68481147557, 1345.95561236514, 1309.51489973952,
1285.62680259649, 1329.46388049714, 1256.00394500077, 1294.0505313591,
1349.09440181876, 1294.72661682462, 1339.38577920408, 1277.114896541,
1267.54884404031, 1291.32793111573, 1254.85565551553, 1298.78499697743,
1283.89664572036, 1273.92831816666, 1310.221891323, 1327.89682404014,
1310.81394400863, 595.342571560588, 689.892254230306, 562.390766853428,
736.319251501976, 609.577261412134, 641.591997384705, 682.957658696869,
580.320759093636, 560.64984978551, 643.487033739876, 688.457314818318,
631.156743281308, 659.535909106305), yy = c(1169.70954243065,
1259.830208937, 1172.21661417439, 1097.62724268622, 1198.15024522658,
1231.90665701131, 1211.36196331211, 1152.4207367321, 1287.57553021171,
1120.61366993258, 1234.70366243878, 1258.47454705197, 893.983957068268,
994.99854601335, 916.330965835536, 947.536265806389, 950.345051732045,
934.313361799171, 1018.76942964176, 918.182358835366, 1005.51128858608,
967.577307930044, 997.239384198691, 995.866808447868, 962.292293255127,
864.624084608006, 895.091604672023, 906.22162647536, 1024.45206885923,
908.693026118345, 923.625774785301, 931.801569764776, 1007.88553380827,
848.55309782664, 927.608364899483, 1024.60765786828, 1085.64295260059,
1057.90632135992, 1195.30607038065, 1151.39888340311, 1168.2831257626,
1137.15375447446, 1145.42393212912, 1108.89072769468, 1075.15451622384,
1129.91711324634, 1191.94330388541, 1132.41649984784, 1210.89342724886,
1100.60339252755, 1083.5987922884, 1056.69487941162, 1150.2707936581,
1055.75678264632, 1055.53323667429, 1049.79655119467, 1166.86598024805,
1141.82593378866, 1066.37755267981, 1160.55793904653, 1162.65728735716,
1060.29360609309, 1107.40480300404, 1825.01445883899, 1802.95011068891,
1692.84948509132, 1675.97166713074, 1758.10341887143, 1788.48414279738,
1680.15824054313, 1756.01930833023, 1706.98458587119, 1770.57687329296,
1692.21991398915, 1835.60585163662, 1790.6487914694, 1787.52076839767,
1704.25313427813, 1735.96312434652, 1813.02044772293, 1847.21159474717,
1725.63580525853, 1841.32016678, 1713.80845602987, 1770.39756152819,
1747.72988313376, 1778.13110060636, 1786.3871288087, 6.01666671271317,
19.2497357431764, 9.6964112500295, -3.23929433528044, 89.4863211231715,
86.0082947221296, 42.7982120490919, 2.19886414532234, 12.8780844043502,
30.694893442471, 7.58386594976601, 83.8385161493349, 36.4551491976192
)), row.names = 100:200, class = "data.frame")
首先我创建了一个函数来消除满足条件的点。
routliers<-function(x){
if(x>2000|x<0){
rm(x)
}
}
然后我使用apply函数跨行消除使用上述函数的点(上面的dput()被命名为cds)
cds<-data.frame(apply(cds,1,routliers))
但这消除了所有点
length(cds)
[1]0
有趣的是,如果我用 print() 替换 rm() 函数,那么我在使用 apply 函数时打印出所需的点,但我收到错误“arguments imply different number of rows: 0, 2 ”。此外,我不确定何时使用 apply() 函数指定的函数适用于两列数据,因为我在 print() 中没有看到任何满足仅第二列点条件的数据点。第一列是 x 坐标,第二列是 y 坐标。我认为错误“arguments imply different number of rows:0,2”表明只有行中的第一个值正在针对函数进行测试。
如果一个或多个数据点满足我的条件,我该如何编写代码来消除行?
当列是单独的向量 (x<-x[!condition]) 时,这很容易做到,但是我不能轻易地将它们再次加在一起,所以我更喜欢在点的数据帧上这样做。
让我们为离群值设置函数 return TRUE
,为非离群值设置 FALSE
。并且可以向量化:
is_outlier = function(x) {
x > 2000 | x < 0
}
以下是我们如何使用它来删除单列中具有异常值的行:
cds[!is_outlier(cds$xx), ]
对于两列,我们可以将 is_outlier
结果与 &
或 |
合并。我无法从您的文本中判断您是要删除 xx
AND yy
异常值的行,还是删除 xx
OR yy
异常值的行。所以选择合适的版本:
cds[!is_outlier(cds$xx) & !is_outlier(cds$yy), ]
cds[!is_outlier(cds$xx) | !is_outlier(cds$yy), ]
请检查此代码是否适合您,df
是您共享的数据:
#Code
new <- df[!rowSums(df < 0 | df>2000) > 0, ]
或者这样:
#Code 2
new <- df[which(apply(df,1,function(x) sum(x<0 | x>2000))==0),]