将列更改为看起来像排名(SQL 或 R)
Changing a column to look like a rank (SQL or R)
我有一个非常大的table,它遵循这些结构(我在这里简化了它):
Product
Line
Name
Quantity
Unit
Cost
Pepe
10000
Lucia
4
UD
8
Pepe
70000
Santiago
7
UD
5.5
Pepe
70000
Mariangeles
10
KG
6
Antonio
10000
Naiara
4
KG
8
Antonio
70000
Toni
7
KG
3
Vanesa
10000
Lucia
4
UD
8
Vanesa
50000
Santiago
7
KG
8
Vanesa
50000
Toni
10
KG
3
Vanesa
50000
Gines
4
KG
8
我需要转换列 Line,我需要重复的数字 (70000, 50000...) 看起来像每个产品的排名 (10000, 20000, 30000, 40000, 50000...)。
Product
Line
Name
Quantity
Unit
Cost
Pepe
10000
Lucia
4
UD
8
Pepe
20000
Santiago
7
UD
5.5
Pepe
30000
Mariangeles
10
KG
6
Antonio
10000
Naiara
2
KG
8
Antonio
20000
Toni
7
KG
3
Vanesa
10000
Lucia
4
UD
8
Vanesa
20000
Santiago
7
KG
8
Vanesa
30000
Toni
10
KG
3
Vanesa
40000
Gines
4
KG
8
我可以使用 SQL(DBeaver 或 Microsoft Access)或 R 来完成,我正在考虑使用 R 循环或使用 count() 的 SQL 复杂查询,但有些帮助会是非常感谢。
非常感谢。
在sql中你可以使用window function
:
select * , 10000 * ROW_NUMBER() over (partition by product order by line) as rn
from yourtable
我使用 Line 对每组中的行进行排序,您可以将其更改为任何有意义的
在R
中,我们可以按'Product'分组,将row_number()
乘以'Line'
的first
元素
library(dplyr)
df1 %>%
group_by(Product) %>%
mutate(Line = row_number() * first(Line)) %>%
ungroup
-输出
# A tibble: 9 x 6
# Product Line Name Quantity Unit Cost
# <chr> <int> <chr> <int> <chr> <dbl>
#1 Pepe 10000 Lucia 4 UD 8
#2 Pepe 20000 Santiago 7 UD 5.5
#3 Pepe 30000 Mariangeles 10 KG 6
#4 Antonio 10000 Naiara 4 KG 8
#5 Antonio 20000 Toni 7 KG 3
#6 Vanesa 10000 Lucia 4 UD 8
#7 Vanesa 20000 Santiago 7 KG 8
#8 Vanesa 30000 Toni 10 KG 3
#9 Vanesa 40000 Gines 4 KG 8
数据
df1 <- structure(list(Product = c("Pepe", "Pepe", "Pepe", "Antonio",
"Antonio", "Vanesa", "Vanesa", "Vanesa", "Vanesa"), Line = c(10000L,
70000L, 70000L, 10000L, 70000L, 10000L, 50000L, 50000L, 50000L
), Name = c("Lucia", "Santiago", "Mariangeles", "Naiara", "Toni",
"Lucia", "Santiago", "Toni", "Gines"), Quantity = c(4L, 7L, 10L,
4L, 7L, 4L, 7L, 10L, 4L), Unit = c("UD", "UD", "KG", "KG", "KG",
"UD", "KG", "KG", "KG"), Cost = c(8, 5.5, 6, 8, 3, 8, 8, 3, 8
)), class = "data.frame", row.names = c(NA, -9L))
我有一个非常大的table,它遵循这些结构(我在这里简化了它):
Product | Line | Name | Quantity | Unit | Cost |
---|---|---|---|---|---|
Pepe | 10000 | Lucia | 4 | UD | 8 |
Pepe | 70000 | Santiago | 7 | UD | 5.5 |
Pepe | 70000 | Mariangeles | 10 | KG | 6 |
Antonio | 10000 | Naiara | 4 | KG | 8 |
Antonio | 70000 | Toni | 7 | KG | 3 |
Vanesa | 10000 | Lucia | 4 | UD | 8 |
Vanesa | 50000 | Santiago | 7 | KG | 8 |
Vanesa | 50000 | Toni | 10 | KG | 3 |
Vanesa | 50000 | Gines | 4 | KG | 8 |
我需要转换列 Line,我需要重复的数字 (70000, 50000...) 看起来像每个产品的排名 (10000, 20000, 30000, 40000, 50000...)。
Product | Line | Name | Quantity | Unit | Cost |
---|---|---|---|---|---|
Pepe | 10000 | Lucia | 4 | UD | 8 |
Pepe | 20000 | Santiago | 7 | UD | 5.5 |
Pepe | 30000 | Mariangeles | 10 | KG | 6 |
Antonio | 10000 | Naiara | 2 | KG | 8 |
Antonio | 20000 | Toni | 7 | KG | 3 |
Vanesa | 10000 | Lucia | 4 | UD | 8 |
Vanesa | 20000 | Santiago | 7 | KG | 8 |
Vanesa | 30000 | Toni | 10 | KG | 3 |
Vanesa | 40000 | Gines | 4 | KG | 8 |
我可以使用 SQL(DBeaver 或 Microsoft Access)或 R 来完成,我正在考虑使用 R 循环或使用 count() 的 SQL 复杂查询,但有些帮助会是非常感谢。
非常感谢。
在sql中你可以使用window function
:
select * , 10000 * ROW_NUMBER() over (partition by product order by line) as rn
from yourtable
我使用 Line 对每组中的行进行排序,您可以将其更改为任何有意义的
在R
中,我们可以按'Product'分组,将row_number()
乘以'Line'
first
元素
library(dplyr)
df1 %>%
group_by(Product) %>%
mutate(Line = row_number() * first(Line)) %>%
ungroup
-输出
# A tibble: 9 x 6
# Product Line Name Quantity Unit Cost
# <chr> <int> <chr> <int> <chr> <dbl>
#1 Pepe 10000 Lucia 4 UD 8
#2 Pepe 20000 Santiago 7 UD 5.5
#3 Pepe 30000 Mariangeles 10 KG 6
#4 Antonio 10000 Naiara 4 KG 8
#5 Antonio 20000 Toni 7 KG 3
#6 Vanesa 10000 Lucia 4 UD 8
#7 Vanesa 20000 Santiago 7 KG 8
#8 Vanesa 30000 Toni 10 KG 3
#9 Vanesa 40000 Gines 4 KG 8
数据
df1 <- structure(list(Product = c("Pepe", "Pepe", "Pepe", "Antonio",
"Antonio", "Vanesa", "Vanesa", "Vanesa", "Vanesa"), Line = c(10000L,
70000L, 70000L, 10000L, 70000L, 10000L, 50000L, 50000L, 50000L
), Name = c("Lucia", "Santiago", "Mariangeles", "Naiara", "Toni",
"Lucia", "Santiago", "Toni", "Gines"), Quantity = c(4L, 7L, 10L,
4L, 7L, 4L, 7L, 10L, 4L), Unit = c("UD", "UD", "KG", "KG", "KG",
"UD", "KG", "KG", "KG"), Cost = c(8, 5.5, 6, 8, 3, 8, 8, 3, 8
)), class = "data.frame", row.names = c(NA, -9L))