使用两个不同的输入文件 --- 示例：每小时数据和每日数据（不同长度）

Question

我正在编写一些代码来处理一年的每小时和每天的数据，我对如何合并这两个文件中的数据感到有点困惑。我正在做的是使用数据集 B 的每小时模式，但使用每日集 A 对其进行缩放。 ...所以本质上（使用下面的示例）我将采用 93 cfs 的每日平均值（数据集 A）并将其乘以一天 24 小时等于 2232 。然后，我将对每天所有 24 小时的每小时 cfs 值求和（数据集 B）...在这种情况下，1/1/2021 将等于 2596。通常以这些方式操纵汇率没有意义，但在这种情况下无关紧要，因为单位抵消了。然后我需要获取这些值并将它们除以 2232/2596 = 0.8597 并将其应用于每天所有 24 小时（数据集 B）的每小时 cfs 值以获得新的“缩放”数据集（将是数据集 C).

我的问题是我从未使用两个不同的输入数据集在 Python 中编码（我是一个完全的新手）。我开始试验代码，但问题是——我似乎无法整合这两个数据集。如果有人能指出我如何集成两个单独的输入文件的方向，我将不胜感激。数据集下方是我对代码的尝试（请注意代码的相反顺序——首先处理每小时数据（数据集 B），然后处理每日数据（数据集 A）。我打印出最终比例因子 (SF)只给我一个打印输出...不是全部 8,760，因为我不在循环中...但我怎么能同时在两个输入文件的循环中？？？

数据集A（每日）--365行数据:

1/1/2021 93 cfs
1/2/2021 0 cfs
1/3/2021 70 cfs
1/4/2021 70 cfs

数据集 B（每小时）-- 8,760 行数据：

1/1/2021 0:00 150 cfs
1/1/2021 1:00 0 cfs
1/1/2021 2:00 255 cfs
（其中 1/1/2021 所有 24 小时的总和 = 2596 cfs）等等

抱歉，如果这是一个非常简单的问题...我对编码还很陌生。

这是我到目前为止编写的代码...我需要的是 8,760 行 SF...然后我可以使用它乘以原始数据集 B。数据集的最终产品C 将是日期 - 时间 - 重新调整的每小时数据。实际上，我必须对总共三个抽油机执行此操作...给我一个 5 列乘 8,760 行的矩阵，但我认为我能够弄清楚单位的事情。我现在的问题是如何整合这两个数据集。感谢阅读！

print('Solving the Temperature Model programming problem')
fhand1 = open('Interpolate_CY21_short.txt')
fhand2 = open('WSE_Daily_CY21_short.txt')

#Hourly Interpolated Pardee PowerHouse Data
for line1 in fhand1:
    line1 = line1.rstrip()
    words1 = line1.split()
    #Hourly interpolated data - parsed down (cfs)
    x = float(words1[7])
    if x<100:
        x = 0
    #print(x)

#WSE Daily Average PowerHouse Data
for line2 in fhand2:
    line2 = line2.rstrip()
    words2 = line2.split()
    #Daily cfs average x 24 hrs
    aa = float(words2[2])*24
    #print(a)

SF = x * aa
print(SF)

Answer 1

这就是将数据放入两个列表的方式，

fhand1 = open('Interpolate_CY21_short.txt', 'r')
fhand2 = open('WSE_Daily_CY21_short.txt', 'r')
daily_average = fhand1.readlines()
daily = fhand2.readlines()

# this is what the to lists would look like, roughly
# each line would be a separate string
daily_average = ["1/1/2021 93 cfs","1/2/2021 0 cfs"]
daily = ["1/1/2021 0:00 150 cfs", "1/1/2021 1:00 0 cfs", "1/2/2021 1:00 0 cfs"]

然后，处理列表可能会使用双重嵌套 for 循环


for average_line in daily_average:
    average_line = average_line.rstrip()
    average_date, average_count, average_symbol = average_line.split()

    for daily_line in daily:
        daily_line = daily_line.rstrip()
        date, hour, count, symbol = daily_line.split()
        if average_date == date:
            print(f"date={date}, average_count={average_count} count={count}")

或者字典


# populate data into dictionaries
daily_average_data = dict()
for line in daily_average:
    line = line.rstrip()
    day, count, symbol = line.split()
    daily_average_data[day] = (day, count, symbol)

daily_data = dict()
for line in daily:
    line = line.rstrip()
    day, hour, count, symbol = line.split()
    if day not in daily_data:
        daily_data[day] = list()
    daily_data[day].append((day, hour, count, symbol))

# now you can access daily_average_data and daily_data as
# dictionaries instead of files

# process data
result = list()
for date in daily_data.keys():
    print(date)
    print(daily_average_data[date])
    print(daily_data[date])

如果数据项逐行对应，可以用https://realpython.com/python-zip-function/

这里有一个例子：

for data1, data2 in zip(daiy_average, daily):
    print(f"{data1} {data2}")

Answer 2

与@oasispolo 描述的类似，解决方案是创建一个循环并处理其中的两个列表。我个人不喜欢“zip”功能。（这纯粹是风格上的异议；很多其他人喜欢它，这很好。）

这是一个语法更直观的解决方案：

print('Solving the Temperature Model programming problem')
fhand1 = open('Interpolate_CY21_short.txt', 'r')
fhand2 = open('WSE_Daily_CY21_short.txt', 'r')

# Convert each file into a list of lines. You're doing this
# implicitly, but I like to be explicit about it.
lines1 = fhand1.readlines()
lines2 = fhand2.readlines()

if len(lines1) != len(lines2):
    raise ValueError("The two files have different length!")

# Initialize an output array. You cold also construct it
# one item at a time, but that can be slow for large arrays.
# It is more efficient to initialize the entire array at 
# once if possible.
sf_list = [0]*len(lines1)

for position in range(len(lines1)):
    # range(L) generates numbers 0...L-1
    line1 = lines1[position].rstrip()
    words1 = line1.split()
    x = float(words1[7])
    if x<100:
        x = 0

    line2 = lines2[position].rstrip()
    words2 = line2.split()
    aa = float(words2[2])*24

    sf_list[position] = x * aa

print(sf_list)

使用两个不同的输入文件 --- 示例：每小时数据和每日数据（不同长度）

Working with Two Different Input Files --- example: Hourly Data and Daily Data (with different lengths)

python

foreach

file