在 R 中对多列使用排序和排名
Using sort and rank in R on multiple columns
我正在尝试按每个州的最低费率对我的医院名称进行排名。
当多家医院的费率相同时,应使用医院名称并按字母顺序排序来打破平局。到目前为止,我已经设法按州内的比率对它进行排名,按医院名称对其进行排序,但我不知道如何打破联系并在不跳过数字的情况下对其进行排名
到目前为止,这是我使用以下代码得到的结果:
outcome_data <- read.csv("outcome-of-care-measures.csv", na.strings="Not Available" ,stringsAsFactors=FALSE) #Read csv file
myData = outcome_data[,c(2, 7, 11)] #Retrieve only Hosp name, state and heart attack rate
arr1<-myData[complete.cases(myData[,3]),] ##Remove NAs
arr2 <- arr1[order(arr1[2], arr1[3], arr1[1]),] #sort by state, then rate and then hospital name
arr3<-transform(arr2, rank = ave(rate, State, FUN = function(x) rank(x, ties.method = "min"))) #Rank by rate within each state
我目前得到的输出是:
Hospital.Name State rate rank
SOUTH PENINSULA HOSPITAL AK 10.8 1
YUKON KUSKOKWIM DELTA REG HOSPITAL AK 11.2 2
MAT-SU REGIONAL MEDICAL CENTER AK 11.4 3
PEACEHEALTH KETCHIKAN MEDICAL CENTER AK 11.4 3
ALASKA NATIVE MEDICAL CENTER AK 11.6 5
BARTLETT REGIONAL HOSPITAL AK 11.6 5
CENTRAL PENINSULA GENERAL HOSPITAL AK 11.6 5
PROVIDENCE ALASKA MEDICAL CENTER AK 12.4 8
ALASKA REGIONAL HOSPITAL AK 13.4 9
FAIRBANKS MEMORIAL HOSPITAL AK 15.6 10
GEORGE H. LANIER MEMORIAL HOSPITAL AL 8.8 1
EVERGREEN MEDICAL CENTER AL 9.1 2
BAPTIST MEDICAL CENTER EAST AL 9.6 3
LAWRENCE MEDICAL CENTER AL 9.9 4
ANDALUSIA REGIONAL HOSPITAL AL 10.1 5
JACKSON HOSPITAL & CLINIC INC AL 10.2 6
BIRMINGHAM VA MEDICAL CENTER AL 10.4 7
FLORALA MEMORIAL HOSPITAL AL 10.4 7
GROVE HILL MEMORIAL HOSPITAL AL 10.4 7
SPRINGHILL MEDICAL CENTER AL 10.4 7
WEDOWEE HOSPITAL AL 10.4 7
PARKWAY MEDICAL CENTER AL 10.5 12
ST VINCENT'S BIRMINGHAM AL 10.6 13
WIREGRASS MEDICAL CENTER AL 10.6 13
GADSDEN REGIONAL MEDICAL CENTER AL 10.7 15
HALE COUNTY HOSPITAL AL 10.7 15
MOBILE INFIRMARY AL 10.7 15
但我想得到的是
Hospital.Name State rate rank
SOUTH PENINSULA HOSPITAL AK 10.8 1
YUKON KUSKOKWIM DELTA REG HOSPITAL AK 11.2 2
MAT-SU REGIONAL MEDICAL CENTER AK 11.4 3
PEACEHEALTH KETCHIKAN MEDICAL CENTER AK 11.4 4
ALASKA NATIVE MEDICAL CENTER AK 11.6 5
BARTLETT REGIONAL HOSPITAL AK 11.6 6
CENTRAL PENINSULA GENERAL HOSPITAL AK 11.6 7
PROVIDENCE ALASKA MEDICAL CENTER AK 12.4 8
ALASKA REGIONAL HOSPITAL AK 13.4 9
FAIRBANKS MEMORIAL HOSPITAL AK 15.6 10
GEORGE H. LANIER MEMORIAL HOSPITAL AL 8.8 1
EVERGREEN MEDICAL CENTER AL 9.1 2
BAPTIST MEDICAL CENTER EAST AL 9.6 3
LAWRENCE MEDICAL CENTER AL 9.9 4
ANDALUSIA REGIONAL HOSPITAL AL 10.1 5
JACKSON HOSPITAL & CLINIC INC AL 10.2 6
BIRMINGHAM VA MEDICAL CENTER AL 10.4 7
FLORALA MEMORIAL HOSPITAL AL 10.4 8
GROVE HILL MEMORIAL HOSPITAL AL 10.4 9
SPRINGHILL MEDICAL CENTER AL 10.4 10
WEDOWEE HOSPITAL AL 10.4 11
PARKWAY MEDICAL CENTER AL 10.5 12
ST VINCENT'S BIRMINGHAM AL 10.6 13
WIREGRASS MEDICAL CENTER AL 10.6 14
GADSDEN REGIONAL MEDICAL CENTER AL 10.7 15
HALE COUNTY HOSPITAL AL 10.7 16
MOBILE INFIRMARY AL 10.7 17
有什么想法吗?
在 order
步骤
之后,我们需要一个分组序列号
library(dplyr)
arr2 %>%
group_by(State) %>%
mutate(rank = row_number())
或者如果我们从 'arr1'
开始
arr1 %>%
arrange(State, rate, Hospital.Name) %>%
group_by(State) %>%
mutate(rank = row_number())
或使用 base R
中的 ave
with(arr2, ave(seq_along(State), State, FUN = seq_along))
#[1] 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
使用 data.table
这相对简单:
library(data.table)
# Read only relevant columns from csv file using data.table::fread
outcome_data <- fread("outcome-of-care-measures.csv",
na.strings="Not Available" ,
select = c("Hospital.Name","State","rate"))
# Drop rows NA values using data.table::na.omit
outcome_data <- na.omit(outcome_data)
## Use data.table::setkey to sort/index by State, then rate, then hospital name
setkey(outcome_data,State,rate,Hospital.Name)
## Add a rank column by state, order within groups will be based key order above
## (the .N operator is the number of rows in each State group)
outcome_data[,rank := seq_len(.N),by = .(State)]
我正在尝试按每个州的最低费率对我的医院名称进行排名。 当多家医院的费率相同时,应使用医院名称并按字母顺序排序来打破平局。到目前为止,我已经设法按州内的比率对它进行排名,按医院名称对其进行排序,但我不知道如何打破联系并在不跳过数字的情况下对其进行排名
到目前为止,这是我使用以下代码得到的结果:
outcome_data <- read.csv("outcome-of-care-measures.csv", na.strings="Not Available" ,stringsAsFactors=FALSE) #Read csv file
myData = outcome_data[,c(2, 7, 11)] #Retrieve only Hosp name, state and heart attack rate
arr1<-myData[complete.cases(myData[,3]),] ##Remove NAs
arr2 <- arr1[order(arr1[2], arr1[3], arr1[1]),] #sort by state, then rate and then hospital name
arr3<-transform(arr2, rank = ave(rate, State, FUN = function(x) rank(x, ties.method = "min"))) #Rank by rate within each state
我目前得到的输出是:
Hospital.Name State rate rank
SOUTH PENINSULA HOSPITAL AK 10.8 1
YUKON KUSKOKWIM DELTA REG HOSPITAL AK 11.2 2
MAT-SU REGIONAL MEDICAL CENTER AK 11.4 3
PEACEHEALTH KETCHIKAN MEDICAL CENTER AK 11.4 3
ALASKA NATIVE MEDICAL CENTER AK 11.6 5
BARTLETT REGIONAL HOSPITAL AK 11.6 5
CENTRAL PENINSULA GENERAL HOSPITAL AK 11.6 5
PROVIDENCE ALASKA MEDICAL CENTER AK 12.4 8
ALASKA REGIONAL HOSPITAL AK 13.4 9
FAIRBANKS MEMORIAL HOSPITAL AK 15.6 10
GEORGE H. LANIER MEMORIAL HOSPITAL AL 8.8 1
EVERGREEN MEDICAL CENTER AL 9.1 2
BAPTIST MEDICAL CENTER EAST AL 9.6 3
LAWRENCE MEDICAL CENTER AL 9.9 4
ANDALUSIA REGIONAL HOSPITAL AL 10.1 5
JACKSON HOSPITAL & CLINIC INC AL 10.2 6
BIRMINGHAM VA MEDICAL CENTER AL 10.4 7
FLORALA MEMORIAL HOSPITAL AL 10.4 7
GROVE HILL MEMORIAL HOSPITAL AL 10.4 7
SPRINGHILL MEDICAL CENTER AL 10.4 7
WEDOWEE HOSPITAL AL 10.4 7
PARKWAY MEDICAL CENTER AL 10.5 12
ST VINCENT'S BIRMINGHAM AL 10.6 13
WIREGRASS MEDICAL CENTER AL 10.6 13
GADSDEN REGIONAL MEDICAL CENTER AL 10.7 15
HALE COUNTY HOSPITAL AL 10.7 15
MOBILE INFIRMARY AL 10.7 15
但我想得到的是
Hospital.Name State rate rank
SOUTH PENINSULA HOSPITAL AK 10.8 1
YUKON KUSKOKWIM DELTA REG HOSPITAL AK 11.2 2
MAT-SU REGIONAL MEDICAL CENTER AK 11.4 3
PEACEHEALTH KETCHIKAN MEDICAL CENTER AK 11.4 4
ALASKA NATIVE MEDICAL CENTER AK 11.6 5
BARTLETT REGIONAL HOSPITAL AK 11.6 6
CENTRAL PENINSULA GENERAL HOSPITAL AK 11.6 7
PROVIDENCE ALASKA MEDICAL CENTER AK 12.4 8
ALASKA REGIONAL HOSPITAL AK 13.4 9
FAIRBANKS MEMORIAL HOSPITAL AK 15.6 10
GEORGE H. LANIER MEMORIAL HOSPITAL AL 8.8 1
EVERGREEN MEDICAL CENTER AL 9.1 2
BAPTIST MEDICAL CENTER EAST AL 9.6 3
LAWRENCE MEDICAL CENTER AL 9.9 4
ANDALUSIA REGIONAL HOSPITAL AL 10.1 5
JACKSON HOSPITAL & CLINIC INC AL 10.2 6
BIRMINGHAM VA MEDICAL CENTER AL 10.4 7
FLORALA MEMORIAL HOSPITAL AL 10.4 8
GROVE HILL MEMORIAL HOSPITAL AL 10.4 9
SPRINGHILL MEDICAL CENTER AL 10.4 10
WEDOWEE HOSPITAL AL 10.4 11
PARKWAY MEDICAL CENTER AL 10.5 12
ST VINCENT'S BIRMINGHAM AL 10.6 13
WIREGRASS MEDICAL CENTER AL 10.6 14
GADSDEN REGIONAL MEDICAL CENTER AL 10.7 15
HALE COUNTY HOSPITAL AL 10.7 16
MOBILE INFIRMARY AL 10.7 17
有什么想法吗?
在 order
步骤
library(dplyr)
arr2 %>%
group_by(State) %>%
mutate(rank = row_number())
或者如果我们从 'arr1'
开始arr1 %>%
arrange(State, rate, Hospital.Name) %>%
group_by(State) %>%
mutate(rank = row_number())
或使用 base R
ave
with(arr2, ave(seq_along(State), State, FUN = seq_along))
#[1] 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
使用 data.table
这相对简单:
library(data.table)
# Read only relevant columns from csv file using data.table::fread
outcome_data <- fread("outcome-of-care-measures.csv",
na.strings="Not Available" ,
select = c("Hospital.Name","State","rate"))
# Drop rows NA values using data.table::na.omit
outcome_data <- na.omit(outcome_data)
## Use data.table::setkey to sort/index by State, then rate, then hospital name
setkey(outcome_data,State,rate,Hospital.Name)
## Add a rank column by state, order within groups will be based key order above
## (the .N operator is the number of rows in each State group)
outcome_data[,rank := seq_len(.N),by = .(State)]