如何使用 Python 从文本文件加载计算输入？

Question

我有一个文本文件包含：

Number Name subject1 subject2 subject3 subject4 subject5
1234567 Jan 5 7 0 6 4
3526435 Marie 5 5 7 0 0
2230431 Kees 6 10 0 8 6
7685433 André 4 7 8 7 5
0364678 Antoinette 0 2 8 8 8
1424354 Jerôme 7 9 0 5 0
4536576 Kamal 8 0 8 7 8
1256033 Diana 0 0 0 0 0
5504657 Petra 6 6 7 0 6
9676575 Malika 0 6 0 0 8
0253756 Samira 3 8 6 7 10

我想计算每个学生的平均成绩（如果grade = 0则不包括在内），我想计算每个科目的平均成绩（同样，不计算0）。

在我自己的代码中，我复制了所有信息并将其放入列表中。

我面临的问题是，我需要我的 Python 程序来读取文本文件并使用给定的数字进行计算。

到目前为止，我只有这些：

i = 0
file = open("resultaten.txt", "r")

for x in file:
    if i == 0:
        print("Lines: ")

    else:
        x = x.split()
        print(i, x)
    i +=1

如何使用文本文件计算一行中的特定字符？

提前致谢。

Answer 1

使用专为处理您的表格数据而设计的库可以更轻松地执行这些类型的操作。 Pandas 是一个很好的例子，尽管进入它可能有点令人生畏，尤其是对于那些没有太多使用 python 经验的人来说。无论如何，这里有一种方法可以实现（我认为）你想要的，使用 pandas。您排除零值会使它变得有点复杂，因此神秘代码：

# -*- coding: utf-8 -*-
# ^This line makes sure python is able to read some weird
# accented characters.

# Importing variaous libraries
import sys
import pandas as pd
import numpy as np

# Depending on your version of python, we need to import
# a different library for reading your input data as a
# string. This step is not required, you should probably
# use the pandas function called read_csv(), if you have
# your file stored locally.
if sys.version_info[0] < 3:
    from StringIO import StringIO
else:
    from io import StringIO

input_data = StringIO("""Number Name subject1 subject2 subject3 subject4 subject5
1234567 Jan 5 7 0 6 4
3526435 Marie 5 5 7 0 0
2230431 Kees 6 10 0 8 6
7685433 André 4 7 8 7 5
0364678 Antoinette 0 2 8 8 8
1424354 Jerôme 7 9 0 5 0
4536576 Kamal 8 0 8 7 8
1256033 Diana 0 0 0 0 0
5504657 Petra 6 6 7 0 6
9676575 Malika 0 6 0 0 8
0253756 Samira 3 8 6 7 10
""")

# Read data, specify that columns are delimited by space,
# using the sep= argument.
df = pd.read_csv(input_data, sep=" ")

# Find all column names contain subject scores, based on their name
# We just pick all columns that starts with the string "subject".
subject_columns = [c for c in df.columns if c.startswith("subject")]
print subject_columns

# Calculate mean score for each subject by finding the sum of all scores
# for each subject, then divide it by the number of data points for each
# subject that does not equal (or is greater than) 0.
for subject in subject_columns:
    df["%s_mean" % subject] = float(df[subject].sum()) / float(len(df[subject].loc[df[subject] > 0]))

# Calculate mean for each student, without 0s
# The .replace(0, np.NaN).count(axis=1) is just a trick to find the
# number of non-zero values in each row. In short, it replaces all
# values that are 0 with NaN, so that the count() function ignores
# those values when calculating the number of data points that are
# present in the dataset. I.e. it disregards values that are 0,
# so that they're excluded from the mean calculation.
df["student_mean"] = df[subject_columns].sum(axis=1) / df[subject_columns].replace(0, np.NaN).count(axis=1)

# This just configures pandas to print all columns in our dataset,
# and not truncate the print-out to fit to the screen.
pd.set_option("display.max_columns", 1000)

# Print out our final dataframe.
print df

最终数据集如下所示：

     Number        Name  subject1  subject2  subject3  subject4  subject5  subject1_mean  subject2_mean  subject3_mean  subject4_mean  subject5_mean  student_mean
0   1234567         Jan         5         7         0         6         4            5.5       6.666667       7.333333       6.857143          6.875      5.500000
1   3526435       Marie         5         5         7         0         0            5.5       6.666667       7.333333       6.857143          6.875      5.666667
2   2230431        Kees         6        10         0         8         6            5.5       6.666667       7.333333       6.857143          6.875      7.500000
3   7685433       André         4         7         8         7         5            5.5       6.666667       7.333333       6.857143          6.875      6.200000
4    364678  Antoinette         0         2         8         8         8            5.5       6.666667       7.333333       6.857143          6.875      6.500000
5   1424354      Jerôme         7         9         0         5         0            5.5       6.666667       7.333333       6.857143          6.875      7.000000
6   4536576       Kamal         8         0         8         7         8            5.5       6.666667       7.333333       6.857143          6.875      7.750000
7   1256033       Diana         0         0         0         0         0            5.5       6.666667       7.333333       6.857143          6.875           NaN
8   5504657       Petra         6         6         7         0         6            5.5       6.666667       7.333333       6.857143          6.875      6.250000
9   9676575      Malika         0         6         0         0         8            5.5       6.666667       7.333333       6.857143          6.875      7.000000
10   253756      Samira         3         8         6         7        10            5.5       6.666667       7.333333       6.857143          6.875      6.800000

请注意，您需要安装 pandas 模块才能正常工作。您还需要 numpy 模块。

Answer 2

你可以索引你的 x.split() 函数，我会避免重写 x.

y = x.split() Number = y[0] Name = y[1] ...

或

Number, Name, subject1, subject2, subject3, subject4, subject5 = x.split()

然后，您可以计算平均值。您可以尝试类似...

    Number, Name, subject1, subject2, subject3, subject4, subject5 = x.split()
    subjects = [float(subject1), float(subject2), float(subject3), float(subject4), float(subject5)]
    sum = 0
    zero_count = 0
    for subject in subjects:
       sum += subject
       if subject is 0:
          zero_count += 1
    # this will print the mean
    print(i,  sum/(len(subjects)-zero_count)

此代码块可以替换您的 else 语句中的内容，它将打印索引和排除“0”成绩的平均值。

Answer 3

如果我们将其转换为字典，我们将可以更灵活地处理信息。这可以通过一点努力来完成。我们可以使用第一行来创建我们的 keys 然后我们可以将这些键与其他行一起压缩，然后通过压缩这些列表来创建元组列表。从那里我们可以使用字典构造函数来创建我们的字典列表。现在我们只需要从这个字典列表中收集所有 keys 列表中每个项目的 subjects ，将它们映射到整数并为学生得分时创建一个例外 0s。如果不是，我们从完整列表中过滤掉 0，然后计算平均值。接下来要获取每个 subject 的平均值，我们可以提取与该主题相关的所有值，而不是取 0 的值，同样我们映射 ints，然后计算平均值。我为出场添加了一些文字说明，这不是必需的。其余主题的过程将是相同的，只是换掉主题。

with open('text.txt') as f:
    content = [line.split() for line in f]

keys = content[0]

lst = list(zip([keys]*(len(content)-1), content[1:]))
x = [zip(i[0], i[1]) for i in lst]
z = [dict(i) for i in x]

print('Average Grades'.center(30))
for i in z:
    subs =[i['subject1'], i['subject2'], i['subject3'], i['subject4'], i['subject5']]
    subs = list(map(int, subs))
    if sum(subs) == 0:
        print('{:<10} average grade: {:>4}'.format(i['Name'], 0))
    else:
        subs = list(filter(lambda x: x >0, subs))
        avg = round(sum(subs)/len(subs), 2)
        print('{:<10} average grade: {:>4}'.format(i['Name'], avg))

sub1 = [i['subject1'] for i in z if i['subject1'] != '0']
sub1 = list(map(int, sub1))
sub1_avg = sum(sub1)/len(sub1)
print('\nAverage Grade for Subject 1: {}'.format(sub1_avg))

        Average Grades        
Jan        average grade:  5.5
Marie      average grade: 5.67
Kees       average grade:  7.5
André      average grade:  6.2
Antoinette average grade:  6.5
Jerôme     average grade:  7.0
Kamal      average grade: 7.75
Diana      average grade:    0
Petra      average grade: 6.25
Malika     average grade:  7.0
Samira     average grade:  6.8

Average Grade for Subject 1: 5.5

如何使用 Python 从文本文件加载计算输入？

How to load calculation input from text file with Python?

python

text

numbers

line

calculation