使用两个二进制文件构建加权直方图

Question

我有两个二进制文件需要同时遍历，以便一个文件中生成的值正确（相同位置）对应另一个文件中生成的值。我将值分类到直方图箱中，一个文件中的值对应于另一个文件中值的权重。

我尝试了以下语法：

import numpy as np
import struct
import matplotlib.pyplot as plt

low = np.inf
high = -np.inf

struct_fmt = 'f'
struct_len = struct.calcsize(struct_fmt)
struct_unpack = struct.Struct(struct_fmt).unpack_from

file = "/projects/current/real-core-snaps/core4_256_velx_0009.bin"
file2 = "/projects/current/real-core-snaps/core4_256_dens_0009.bin"

def read_chunks(f, length):
    while True:
        data = f.read(length)
        if not data: break
        yield data

loop = 0

with open(file,"rb") as f:
    for chunk in read_chunks(f, struct_len):   
        x = struct_unpack(chunk)
        low = np.minimum(x, low)
        high = np.maximum(x, high)
        loop += 1

nbins = math.ceil(math.sqrt(loop)) 

bin_edges = np.linspace(low, high, nbins + 1)
total = np.zeros(nbins, np.int64)


f = open(file,"rb")
f2 = open(file2,"rb")

for chunk1,chunk2 in zip(read_chunks(f, struct_len),read_chunks(f2, struct_len)):
    subtotal,e = np.histogram(struct_unpack(chunk1),bins=bin_edges,weights=struct_unpack(chunk2))
    total = np.add(total,subtotal,out=total,casting="unsafe")

plt.hist(bin_edges[:-1], bins=bin_edges, weights=total)
plt.savefig('hist-veldens.svg')

但是生成的直方图很荒谬（见下文）。我做错了什么？

数据文件位于https://drive.google.com/file/d/1fhia2CGzl_aRX9Q9Ng61W-4XJGQe1OCV/view?usp=sharing and https://drive.google.com/file/d/1CrhQjyG2axSFgK9LGytELbxjy3Ndon1S/view?usp=sharing。

Answer 1

错误在于 total = np.zeros(nbins, np.int64) 将整数类型分配给数组 total 的每个元素。鉴于 subtotal 不包含加权直方图中的计数而是 float-type，总计也应为 float.

类型

使用两个二进制文件构建加权直方图

Building a weighted histogram using two binary files

python

numpy

histogram