优化非正则化数据读取到图像

Question

我有一些未规范化的源数据（示例显示在下面代码的 csv 变量上）。在此数据中，我无法保证任何最小值、最大值或步长值。因此，我需要找出源数据。

读取数据并定义绘制图像所需的值后，我使用了下面的循环。运行这样的代码阅读（150k 行）表明代码非常慢，我花了大约 110 秒（!!!）来渲染整个图像（非常小的图像）。

欢迎任何提示，即使我必须使用其他库或数据类型。我的主要 objective 是显示来自 csv 源的 "heat maps"，比如那些可以跨越一百万行的源。将文件读入数据集或绘制图形的速度很快。问题是从 csv 创建图像映射。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import io

csv = """
"X","Y","V"
1001,1001,909.630432
1001,1003,940.660156
1001,1005,890.571594
1001,1007,999.651062
1001,1009,937.775513
1003,1002,937.601074
1003,1004,950.006897
1003,1006,963.458923
1003,1008,878.646851
1003,1012,956.835938
1005,1001,882.472656
1005,1003,857.491028
1005,1005,907.293335
1005,1007,877.087891
1005,1009,852.005554
1007,1002,880.791931
1007,1004,862.990967
1007,1006,882.135864
1007,1008,896.634521
1007,1010,888.916626
1013,1001,853.410583
1013,1003,863.324341
1013,1005,843.284607
1013,1007,852.712097
1013,1009,882.543640
"""

data=io.StringIO(csv)

columns = [ "X" , "Y", "V" ]

df = pd.read_csv(data, sep=',', skip_blank_lines=True, quoting=2, skipinitialspace=True, usecols = columns, index_col=[0,1] ) 

# Fields
x_axis="X"
y_axis="Y"
val="V"

# Unique values on the X-Y axis
x_ind=df.index.get_level_values(x_axis).unique()
y_ind=df.index.get_level_values(y_axis).unique()

# Size of each axis
nx = len(x_ind)
ny = len(y_ind)

# Maxima and minima
xmin = x_ind.min()
xmax = x_ind.max()
ymin = y_ind.min()
ymax = y_ind.max()

img = np.zeros((nx,ny))

print "Entering in loop"
for ix in range(0, nx):
    print "Mapping {0} {1}".format( x_axis, ix )
    for iy in range(0, ny):
        try:
            img[ix,iy] = df.loc[ix+xmin,iy+ymin][val]
        except KeyError:
            img[ix,iy] = np.NaN

plt.imshow(img, extent=[xmin, xmax, ymin, ymax], cmap=plt.cm.jet, interpolation=None)
plt.colorbar()
plt.show()

尝试使用 pcolormesh，但如果不使用类似的循环，则无法将值正确地拟合到网格中。没有循环

我无法创建 z_mesh

x_mesh,y_mesh = np.mgrid[xmin:xmax,ymin:ymax]
z_mesh = ?? hints ?? ;-)

Answer 1

我认为你的代码甚至没有做你想做的，我运行它在图像中只得到 14 个有效点。

您可以使用 pivot() 或 unstack() 然后 reindex() 创建图像。这是你想要的吗？

data=io.StringIO(csv)
df = pd.read_csv(data, sep=',', skip_blank_lines=True, quoting=2,
                 skipinitialspace=True, usecols = columns)
img = df.pivot(index='Y', columns='X', values='V')
img = img.reindex(index=range(df['Y'].min(), df['Y'].max() + 1),
                  columns=range(df['X'].min(), df['X'].max() + 1))

extent = [df['X'].min() - 0.5, df['X'].max() + 0.5,
          df['Y'].min() - 0.5, df['Y'].max() + 0.5]
plt.imshow(img, origin='lower', extent=extent)
plt.colorbar()

优化非正则化数据读取到图像

Optimizing non regularized data reading to image

python

csv

optimization

matplotlib

imshow