使用两个二进制文件构建加权直方图
Building a weighted histogram using two binary files
我有两个二进制文件需要同时遍历,以便一个文件中生成的值正确(相同位置)对应另一个文件中生成的值。我将值分类到直方图箱中,一个文件中的值对应于另一个文件中值的权重。
我尝试了以下语法:
import numpy as np
import struct
import matplotlib.pyplot as plt
low = np.inf
high = -np.inf
struct_fmt = 'f'
struct_len = struct.calcsize(struct_fmt)
struct_unpack = struct.Struct(struct_fmt).unpack_from
file = "/projects/current/real-core-snaps/core4_256_velx_0009.bin"
file2 = "/projects/current/real-core-snaps/core4_256_dens_0009.bin"
def read_chunks(f, length):
while True:
data = f.read(length)
if not data: break
yield data
loop = 0
with open(file,"rb") as f:
for chunk in read_chunks(f, struct_len):
x = struct_unpack(chunk)
low = np.minimum(x, low)
high = np.maximum(x, high)
loop += 1
nbins = math.ceil(math.sqrt(loop))
bin_edges = np.linspace(low, high, nbins + 1)
total = np.zeros(nbins, np.int64)
f = open(file,"rb")
f2 = open(file2,"rb")
for chunk1,chunk2 in zip(read_chunks(f, struct_len),read_chunks(f2, struct_len)):
subtotal,e = np.histogram(struct_unpack(chunk1),bins=bin_edges,weights=struct_unpack(chunk2))
total = np.add(total,subtotal,out=total,casting="unsafe")
plt.hist(bin_edges[:-1], bins=bin_edges, weights=total)
plt.savefig('hist-veldens.svg')
但是生成的直方图很荒谬(见下文)。我做错了什么?
数据文件位于https://drive.google.com/file/d/1fhia2CGzl_aRX9Q9Ng61W-4XJGQe1OCV/view?usp=sharing and https://drive.google.com/file/d/1CrhQjyG2axSFgK9LGytELbxjy3Ndon1S/view?usp=sharing。
错误在于 total = np.zeros(nbins, np.int64)
将整数类型分配给数组 total
的每个元素。鉴于 subtotal
不包含加权直方图中的计数而是 float-type,总计也应为 float
.
类型
我有两个二进制文件需要同时遍历,以便一个文件中生成的值正确(相同位置)对应另一个文件中生成的值。我将值分类到直方图箱中,一个文件中的值对应于另一个文件中值的权重。
我尝试了以下语法:
import numpy as np
import struct
import matplotlib.pyplot as plt
low = np.inf
high = -np.inf
struct_fmt = 'f'
struct_len = struct.calcsize(struct_fmt)
struct_unpack = struct.Struct(struct_fmt).unpack_from
file = "/projects/current/real-core-snaps/core4_256_velx_0009.bin"
file2 = "/projects/current/real-core-snaps/core4_256_dens_0009.bin"
def read_chunks(f, length):
while True:
data = f.read(length)
if not data: break
yield data
loop = 0
with open(file,"rb") as f:
for chunk in read_chunks(f, struct_len):
x = struct_unpack(chunk)
low = np.minimum(x, low)
high = np.maximum(x, high)
loop += 1
nbins = math.ceil(math.sqrt(loop))
bin_edges = np.linspace(low, high, nbins + 1)
total = np.zeros(nbins, np.int64)
f = open(file,"rb")
f2 = open(file2,"rb")
for chunk1,chunk2 in zip(read_chunks(f, struct_len),read_chunks(f2, struct_len)):
subtotal,e = np.histogram(struct_unpack(chunk1),bins=bin_edges,weights=struct_unpack(chunk2))
total = np.add(total,subtotal,out=total,casting="unsafe")
plt.hist(bin_edges[:-1], bins=bin_edges, weights=total)
plt.savefig('hist-veldens.svg')
但是生成的直方图很荒谬(见下文)。我做错了什么?
数据文件位于https://drive.google.com/file/d/1fhia2CGzl_aRX9Q9Ng61W-4XJGQe1OCV/view?usp=sharing and https://drive.google.com/file/d/1CrhQjyG2axSFgK9LGytELbxjy3Ndon1S/view?usp=sharing。
错误在于 total = np.zeros(nbins, np.int64)
将整数类型分配给数组 total
的每个元素。鉴于 subtotal
不包含加权直方图中的计数而是 float-type,总计也应为 float
.