在 pandas 中使用 groupby 和复制

Question

我正在尝试将 R dplyr 代码翻译成 Python Pandas，但在使用 groupby() 和 duplicate() 时我没有得到类似的结果。

我有一个大小为(20000*3)的数据集，如下：

Product	Trade	Crop
Fungi	VIC	Grapes
ASH	CAN	APPLE
FUNGI	CAN	SEED

在R中，他们编写了如下代码：

Products_table <- Products_table %>% group_by(product,crop) %>% filter(! duplicated(trade))}

他们得到一个缩小的数据集作为输出，大小为 (5000*3)。我认为重复的值已被删除。

我在 Python Pandas:

中尝试过同样的事情

 product_table = product_table.groupby(['product','crop']).reset_index(drop=False)

但是我得到了 table 大小 (n*1)，这减少了列大小。

关于如何在 Python Pandas 中获取 groupby 和 duplicated 并获得与 R dplyr 中相同的结果的任何建议？

Answer 1

您可以简单地使用 drop_duplicates 函数，如下所示：

product_table.drop_duplicates(subset=['product','crop'], inplace=True, ignore_index=True)

Answer 2

您可以像在 R 中一样使用 datar:

>>> from datar.all import f, tribble, duplicated, distinct, group_by, filter
>>> 
>>> df = tribble(
...     f.product, f.trade, f.crop,
...     "Fungi",   "VIC",   "Grapes",
...     "ASH",     "CAN",   "APPLE",
...     "FUNGI",   "CAN",   "SEED",
...     "FUNGI",   "CAN",   "SEED",
... )
>>> 
>>> df >> group_by(f.product, f.crop) >> filter(~duplicated(f.trade))
   product    trade     crop
  <object> <object> <object>
0    Fungi      VIC   Grapes
1      ASH      CAN    APPLE
2    FUNGI      CAN     SEED

[Groups: product, crop (n=3)]

但你要找的其实是distinct:

>>> df >> group_by(f.product, f.crop) >> distinct(f.trade)
   product    trade     crop
  <object> <object> <object>
0      ASH      CAN    APPLE
1    FUNGI      CAN     SEED
2    Fungi      VIC   Grapes

[Groups: product, crop (n=3)]

在 pandas 中使用 groupby 和复制

Using groupby and duplicate in pandas

python

r

duplicates

pandas

data-science