计算 group_by() 内的分类变量
Counting categorical variables within group_by()
我正在检查来自 NCED. I have a data frame of parcels that have some repeated IDs and owners. I want to group the repeated IDs into a single row with a count of the distinct number of owners... but based on 的保护地役权数据 我只是返回 ID 行数的计数。
uniqueID <- c(1:10)
parcelID <- c('a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c')
owner <- c('owner1', 'owner1', 'owner1', 'owner2', 'owner3',
'owner2', 'owner2', 'owner2', 'owner3', 'owner1')
mydat1 <- data.frame(uniqueID, parcelID, owner)
numberOwners <- mydat1 %>% group_by(parcelID, owner) %>% tally()
我想要的输出是:
parcelID_grouped nOwners
1 a 3
2 b 1
3 c 2
使用 dplyr
有几种方法可以做到这一点:
library(dplyr)
mydat1 %>% distinct(parcelID, owner) %>% count(parcelID)
mydat1 %>% group_by(parcelID) %>% summarise(n = n_distinct(owner))
两次通话结果:
# parcelID n
# 1 a 3
# 2 b 1
# 3 c 2
使用data.table
:-
library(data.table)
setDT(mydat1)
mydat1[, uniqueID := NULL]
mydat1 <- unique(mydat1)
mydat1[, nOwners := .N, by = parcelID]
mydat1[, owner := NULL]
mydat1 <- unique(mydat1)
setnames(mydat1, "parcelID", "parcelID_grouped")
您将获得所需的输出:-
parcelID_grouped nOwners
1: a 3
2: b 1
3: c 2
我正在检查来自 NCED. I have a data frame of parcels that have some repeated IDs and owners. I want to group the repeated IDs into a single row with a count of the distinct number of owners... but based on
uniqueID <- c(1:10)
parcelID <- c('a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c')
owner <- c('owner1', 'owner1', 'owner1', 'owner2', 'owner3',
'owner2', 'owner2', 'owner2', 'owner3', 'owner1')
mydat1 <- data.frame(uniqueID, parcelID, owner)
numberOwners <- mydat1 %>% group_by(parcelID, owner) %>% tally()
我想要的输出是:
parcelID_grouped nOwners
1 a 3
2 b 1
3 c 2
使用 dplyr
有几种方法可以做到这一点:
library(dplyr)
mydat1 %>% distinct(parcelID, owner) %>% count(parcelID)
mydat1 %>% group_by(parcelID) %>% summarise(n = n_distinct(owner))
两次通话结果:
# parcelID n
# 1 a 3
# 2 b 1
# 3 c 2
使用data.table
:-
library(data.table)
setDT(mydat1)
mydat1[, uniqueID := NULL]
mydat1 <- unique(mydat1)
mydat1[, nOwners := .N, by = parcelID]
mydat1[, owner := NULL]
mydat1 <- unique(mydat1)
setnames(mydat1, "parcelID", "parcelID_grouped")
您将获得所需的输出:-
parcelID_grouped nOwners
1: a 3
2: b 1
3: c 2