如何计算几个 csv 列中所有值的频率

How can I count frequency of all values in several csv columns

我有几个 csv 列,用于存储彩票号码和一些其他信息,例如开奖日期。我需要得到一个字典 作为我的输出。 到目前为止,我已经能够单独打印每列中的出现次数。

# Import libraries
import pandas as pd 
from IPython.display import display

# Turn csv file into a pandas dataframe
df = pd.read_csv("LOTTOMAX.csv")

# Only select columns that I'm interested in. Csv file contains additional useless info.
selection = df[['NUMBER DRAWN 1', 'NUMBER DRAWN 2', 'NUMBER DRAWN 3', 'NUMBER DRAWN 4',
'NUMBER DRAWN 5', 'NUMBER DRAWN 6', 'NUMBER DRAWN 7']]

# Loop over columns and apply value_counts(). Output to terminal.
for col in selection.columns:
    # I have included this to make terminal output more readable.
    print('-' * 40 + col + '-' * 40 , end='\n')
    display(selection[col].value_counts().to_string())

我做这个项目是为了好玩。想要复制 bclc 网站上的一个功能。也许这会对某人有所帮助。

# Import libraries
import pandas as pd
from collections import Counter
import matplotlib.pyplot as plt

# Read csv file
df = pd.read_csv("LOTTOMAX.csv") #csv file https://www.playnow.com/resources/documents/downloadable-numbers/LOTTOMAX.zip

cols = ['NUMBER DRAWN 1', 'NUMBER DRAWN 2', 'NUMBER DRAWN 3', 'NUMBER DRAWN 4',
'NUMBER DRAWN 5', 'NUMBER DRAWN 6', 'NUMBER DRAWN 7']

results = []

# Add data to a list
for i in cols:
    results += df[i].tolist()

# Count occurrences 
occurr = Counter(results)

# Display histogram
plt.bar(list(occurr.keys()), occurr.values(), color='g')
plt.xlabel("Numbers Drawn")
plt.ylabel("Frequency")
plt.show()

此解决方案不完善,但有效。