将列更改为看起来像排名(SQL 或 R)

Changing a column to look like a rank (SQL or R)

我有一个非常大的table,它遵循这些结构(我在这里简化了它):

Product Line Name Quantity Unit Cost
Pepe 10000 Lucia 4 UD 8
Pepe 70000 Santiago 7 UD 5.5
Pepe 70000 Mariangeles 10 KG 6
Antonio 10000 Naiara 4 KG 8
Antonio 70000 Toni 7 KG 3
Vanesa 10000 Lucia 4 UD 8
Vanesa 50000 Santiago 7 KG 8
Vanesa 50000 Toni 10 KG 3
Vanesa 50000 Gines 4 KG 8

我需要转换列 Line,我需要重复的数字 (70000, 50000...) 看起来像每个产品的排名 (10000, 20000, 30000, 40000, 50000...)。

Product Line Name Quantity Unit Cost
Pepe 10000 Lucia 4 UD 8
Pepe 20000 Santiago 7 UD 5.5
Pepe 30000 Mariangeles 10 KG 6
Antonio 10000 Naiara 2 KG 8
Antonio 20000 Toni 7 KG 3
Vanesa 10000 Lucia 4 UD 8
Vanesa 20000 Santiago 7 KG 8
Vanesa 30000 Toni 10 KG 3
Vanesa 40000 Gines 4 KG 8

我可以使用 SQL(DBeaver 或 Microsoft Access)或 R 来完成,我正在考虑使用 R 循环或使用 count() 的 SQL 复杂查询,但有些帮助会是非常感谢。

非常感谢。

在sql中你可以使用window function:

select * , 10000 * ROW_NUMBER() over (partition by product order by line) as rn
from yourtable 

我使用 Line 对每组中的行进行排序,您可以将其更改为任何有意义的

R中,我们可以按'Product'分组,将row_number()乘以'Line'

first元素
library(dplyr)
df1 %>%
    group_by(Product) %>%
    mutate(Line = row_number() * first(Line)) %>%
    ungroup

-输出

# A tibble: 9 x 6
#  Product  Line Name        Quantity Unit   Cost
#  <chr>   <int> <chr>          <int> <chr> <dbl>
#1 Pepe    10000 Lucia              4 UD      8  
#2 Pepe    20000 Santiago           7 UD      5.5
#3 Pepe    30000 Mariangeles       10 KG      6  
#4 Antonio 10000 Naiara             4 KG      8  
#5 Antonio 20000 Toni               7 KG      3  
#6 Vanesa  10000 Lucia              4 UD      8  
#7 Vanesa  20000 Santiago           7 KG      8  
#8 Vanesa  30000 Toni              10 KG      3  
#9 Vanesa  40000 Gines              4 KG      8  

数据

df1 <- structure(list(Product = c("Pepe", "Pepe", "Pepe", "Antonio", 
"Antonio", "Vanesa", "Vanesa", "Vanesa", "Vanesa"), Line = c(10000L, 
70000L, 70000L, 10000L, 70000L, 10000L, 50000L, 50000L, 50000L
), Name = c("Lucia", "Santiago", "Mariangeles", "Naiara", "Toni", 
"Lucia", "Santiago", "Toni", "Gines"), Quantity = c(4L, 7L, 10L, 
4L, 7L, 4L, 7L, 10L, 4L), Unit = c("UD", "UD", "KG", "KG", "KG", 
"UD", "KG", "KG", "KG"), Cost = c(8, 5.5, 6, 8, 3, 8, 8, 3, 8
)), class = "data.frame", row.names = c(NA, -9L))