使用 cusip 获取双重股份类的公司列表

Question

我是 R 的初学者，我有一个包含两列的 table。第一列是 ID。第二列是标签（实际上是 CRSP 数据中的 CUSIP）。标签是八位数字，其中前六位数字标识ID，最后两位数字可能因ID的某些属性而有所不同。

我想要一个具有两个不同标签的 ID 列表，其中两个标签之一以不同的两位数结尾。

例如，如果 table 如下所示，

ID	label
1	11223330
1	11223341
2	11224430
3	11225530
3	11225531
4	11226630
5	11227730
5	11227753

在这种情况下，希望看到

ID	label
1	11223330
1	11223341
3	11225530
3	11225531
5	11227730
5	11227753

非常感谢您！

Answer 1

如果你只有这两列或者你有其他变量，但没有重复 ID-label 对，你可以使用这个

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

dat <- structure(list(ID = c(1L, 1L, 2L, 3L, 3L, 4L, 5L, 5L), label = c(11223330L, 
11223341L, 11224430L, 11225530L, 11225531L, 11226630L, 11227730L, 
11227753L)), row.names = c(NA, 8L), class = "data.frame")

dat <- dat %>% 
  mutate(var3 = 8:1)

dat %>% 
  group_by(ID) %>%  
  filter(n() >= 2) 
#> # A tibble: 6 × 3
#> # Groups:   ID [3]
#>      ID    label  var3
#>   <int>    <int> <int>
#> 1     1 11223330     8
#> 2     1 11223341     7
#> 3     3 11225530     5
#> 4     3 11225531     4
#> 5     5 11227730     2
#> 6     5 11227753     1

但是，如果您有重复的 ID-label 对和您想要保留的其他变量，并且每个 ID-[=13= 只需要一行] 对，您可以将其他变量转换为列表列：

dat <- structure(list(ID = c(1L, 1L, 2L, 3L, 3L, 4L, 5L, 5L), label = c(11223330L, 
11223341L, 11224430L, 11225530L, 11225531L, 11226630L, 11227730L, 
11227753L)), row.names = c(NA, 8L), class = "data.frame")
dat <- bind_rows(dat, dat)
dat <- dat %>% 
  mutate(var3 = 16:1)

out <- dat %>% 
  group_by(ID, label) %>% 
  summarise(across(everything(), ~list(.x))) %>%
  ungroup %>% 
  group_by(ID) %>% 
  filter(n() >= 2) 
#> `summarise()` has grouped output by 'ID'. You can override using the `.groups`
#> argument.

out
#> # A tibble: 6 × 3
#> # Groups:   ID [3]
#>      ID    label var3     
#>   <int>    <int> <list>   
#> 1     1 11223330 <int [2]>
#> 2     1 11223341 <int [2]>
#> 3     3 11225530 <int [2]>
#> 4     3 11225531 <int [2]>
#> 5     5 11227730 <int [2]>
#> 6     5 11227753 <int [2]>

out$var3
#> [[1]]
#> [1] 16  8
#> 
#> [[2]]
#> [1] 15  7
#> 
#> [[3]]
#> [1] 13  5
#> 
#> [[4]]
#> [1] 12  4
#> 
#> [[5]]
#> [1] 10  2
#> 
#> [[6]]
#> [1] 9 1

^{由 reprex package (v2.0.1)}

于 2022-04-25 创建

使用 cusip 获取双重股份类的公司列表

Getting a list of companies with dual share classes using cusip

sorting

r

list

使用 cusip 获取双重股份 类 的公司列表

Getting a list of companies with dual share classes using cusip

sorting

r

list

使用 cusip 获取双重股份类的公司列表