Python - reading from CSV - ValueError: x and y must have same first dimension

Python - reading from CSV - ValueError: x and y must have same first dimension

我开始使用 Python 和 Anaconda。我正在尝试创建一个线图,类似于我使用 R 成功生成的线图。当我尝试使用下面的代码尝试读取 csv 文件时,出现错误 ValueError: x and y must have same first dimension

import csv
import matplotlib as mpl
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cbook as cbook

def getColumn(filename, column):
    results = csv.reader(open(filename), delimiter="\t")
    return [result[column] for result in results if len(result) > column]

Season = getColumn("vs.csv",0)
VORP = getColumn("vs.csv",2)

fig = plt.figure()
plt.figure("VORP vs Season")
plt.xlabel("Season")
plt.ylabel("VORP")
plt.legend(["PlayerA","PlayerB"], loc=9,ncol=2)
plt.plot(Season, VORP)
plt.show()

CSV 文件仅包含以下条目:

Season  Player   VORP
'0405'  PlayerA  .7
'0506'  PlayerA  .14
[and so on]
'0405'  PlayerB  .23
'0506'  PlayerB  -.3
[and so on]

一种解决方案是使用 pandas data analysis library that comes with Anaconda. It is supposed to provide some of the same functionality as R, so it may be a good option for you. And it greatly simplifies importing and manipulating data from csv files. It also has nice plotting capabilities, which use matplotlib.

首先,导入 pandasmatplotlib.pyplot,然后使用前者从您的 csv 创建 pandas.DataFrame object。如果将 DataFrame 打印到控制台,您会发现它看起来非常漂亮。

>>> import pandas as pd
>>> import matplotlib.pyplot as plt
>>>
>>> df = pd.DataFrame.from_csv('vorp.csv', index_col=None)
>>> print df

   Season   Player  VORP
0  '0405'  PlayerA  0.70
1  '0506'  PlayerA  0.14
2  '0405'  PlayerB  0.23
3  '0506'  PlayerB -0.30

现在调用 DataFrame 上的 pivot_table 方法。这只是 return 另一个 DataFrame object,但它的组织方式将使其易于绘制。您需要将 'VORP' 设置为值,将 'Season' 设置为索引(即行),并将 'Player' 设置为列,如下所示:

>>> table = df.pivot_table('VORP', 'Season', 'Player')
>>> print table

Player  PlayerA  PlayerB
Season                  
'0405'     0.70     0.23
'0506'     0.14    -0.30

现在只需要绘制 table 了。只需调用 plot method on your pivot table (which will return a matplotlib.axes object),然后使用 matplotlib 随意操作它。例如,我添加了 y-axis 标签和标题。

>>> ax = table.plot()
>>> ax.set_title('VORP vs Season')
>>> ax.set_ylabel('VORP')
>>> plt.show()

这是结果,毫无疑问,您的完整数据集看起来会更好。