统计特定列中的重复项

Question

惠。我是新 Python 用户。我需要编写一些脚本来从特定的 .txt 文件中提取数据。文件中的数据为：

Milo    12345678901234  DN127   POTATO_123_456  
Milo    12345678901234  DN127   POTATO_123_456
Lamb    12345678901307  DN127   TOMATO_123_456
Lamb    12345678901618  DN127   TOMATO_123_456
Lamb    12345678901953  DN127   TOMATO_123_456
Milo    12345678902213  DN127   CHILI_789_0126  
Milo    12345678902822  DN127   BANANA_134-123

脚本会做的是，它只会显示包含单词 "Milo" 的行，并计算第 4 列（第 3 列）中的重复项。我设法显示带有单词 "Milo" 的行，但不知道如何计算第 8 列中的重复单词。这是我到目前为止所做的：

with open ("food.txt") as food:
                for line in food:
                    if line.find("\tMilo")!=-1:
                        print(line)

Answer 1

使用pandas:

df = pandas.read_csv('food.txt', sep = " ", header = None)
df.columns = ['Product', 'ID', 'Another ID', 'Some Code']

df = df[df['Product'].isin(['Milo'])]
df['Count of Repetitive Some Code'] = df.groupby('Some Code')['Some Code'].transform('count')

图例：

Product 是您的专栏 Milo, etc

ID 是您的专栏 12345678901234, etc

Another ID 是您的专栏 DN127, etc

Some Code 是您的专栏，POTATO_123_456, etc ==> 您要计算的那个。

统计特定列中的重复项

Counting The Repetitive Item in Specific Column

python

counter

count

repeat