难以获得 R 中的子集

Question

我正在尝试根据以下要求对数据集进行子集化：

ethnicity 是 xyz
education为本科及以上学历，即Bachelor's Degree或Graduate Degree
然后我想看看符合上述要求的人的收入档次。括号可以是 ,000 - ,999 或 0,000 - 4,999.
最后，作为我的最终输出，我想查看从第三项（上面）中获得的子集，其中包含这些人是否有宗教信仰的列。在数据集中，对应于 religious 和 not religious。

所以它看起来像这样

   income               religious
,000 - ,999      not religious
,000 - ,999         religious
  ....                    ....
  ....                    ....

记住列出的那些满足要求 1 和 2。

请记住，我是编程新手。很长一段时间以来，我一直试图弄清楚这一点，并且研究了很多 posts。我似乎无法得到任何工作。我该如何解决？有人请帮忙。

为了不影响 post 的清晰度，我将 post 我在下面尝试过的内容（但请随意忽略它，因为它可能是垃圾）。

为了进入第 3 步，我尝试了以下多种变体，但都惨遭失败，我正要 bash 用键盘敲我的脑袋：

df$income[which(df$ethnicity == "xyz" & df$education %in% c("Bachelor's Degree", "Graduate Degree"), ]

我也试过：

race <- df$ethnicity == "xyz"
ba_ma_phd <- df$education %in% c("Graduate Degree", "Bachelor's Degree")
income_sub <- df$income[ba_ma_phd & race]

我相信 income_sub 让我进入第 3 步，但我不知道如何进入第 4 步。

Answer 1

library(dplyr)

df %>%
  filter(ethnicity == "xyz" & 
         education %in% c("Bachelor's Degree", "Graduate Degree")) %>%
  group_by(religious) %>%
  summarize(lower_bound = min(income),
            upper_bound = max(income) )

Answer 2

更改我的评论，因为它有点太长了。

首先是你的代码，你已经差不多了；由于 income 是一个向量而不是数据框，因此您不需要尾随逗号。即你可以使用

df$income[which(df$ethnicity == "xyz" & 
         df$education %in% c("Bachelor's Degree", "Graduate Degree") ] 
 # note no comma after the closing bracket

如果要创建子集数据，则不要在开头包含 df$income，只需使用 df 并保留逗号。这将对您的数据进行子集化，但保留所有列

sub_df <- df[which(df$ethnicity == "xyz" &
       df$education %in% c("Bachelor's Degree", "Graduate Degree"), ]

然后查看子集数据的 income 水平，您可以使用 table

table(sub_df$income)

您可以再次使用 table 通过 religious 状态检查每个 income 的观察计数。

table(sub_df$income, sub_df$religious)

如果您只想 select income 和 religious 列，您也可以使用 [

sub_df[c("religious", "income")]

难以获得 R 中的子集

Having difficulty obtaining subset in R

select

r

subset