Python

Question

我开始使用 Python 和 Anaconda。我正在尝试创建一个线图，类似于我使用 R 成功生成的线图。当我尝试使用下面的代码尝试读取 csv 文件时，出现错误 ValueError: x and y must have same first dimension

import csv
import matplotlib as mpl
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cbook as cbook

def getColumn(filename, column):
    results = csv.reader(open(filename), delimiter="\t")
    return [result[column] for result in results if len(result) > column]

Season = getColumn("vs.csv",0)
VORP = getColumn("vs.csv",2)

fig = plt.figure()
plt.figure("VORP vs Season")
plt.xlabel("Season")
plt.ylabel("VORP")
plt.legend(["PlayerA","PlayerB"], loc=9,ncol=2)
plt.plot(Season, VORP)
plt.show()

CSV 文件仅包含以下条目：

Season  Player   VORP
'0405'  PlayerA  .7
'0506'  PlayerA  .14
[and so on]
'0405'  PlayerB  .23
'0506'  PlayerB  -.3
[and so on]

Answer 1

一种解决方案是使用 pandas data analysis library that comes with Anaconda. It is supposed to provide some of the same functionality as R, so it may be a good option for you. And it greatly simplifies importing and manipulating data from csv files. It also has nice plotting capabilities, which use matplotlib.

首先，导入 pandas 和 matplotlib.pyplot，然后使用前者从您的 csv 创建 pandas.DataFrame object。如果将 DataFrame 打印到控制台，您会发现它看起来非常漂亮。

>>> import pandas as pd
>>> import matplotlib.pyplot as plt
>>>
>>> df = pd.DataFrame.from_csv('vorp.csv', index_col=None)
>>> print df

   Season   Player  VORP
0  '0405'  PlayerA  0.70
1  '0506'  PlayerA  0.14
2  '0405'  PlayerB  0.23
3  '0506'  PlayerB -0.30

现在调用 DataFrame 上的 pivot_table 方法。这只是 return 另一个 DataFrame object，但它的组织方式将使其易于绘制。您需要将 'VORP' 设置为值，将 'Season' 设置为索引（即行），并将 'Player' 设置为列，如下所示：

>>> table = df.pivot_table('VORP', 'Season', 'Player')
>>> print table

Player  PlayerA  PlayerB
Season                  
'0405'     0.70     0.23
'0506'     0.14    -0.30

现在只需要绘制 table 了。只需调用 plot method on your pivot table (which will return a matplotlib.axes object)，然后使用 matplotlib 随意操作它。例如，我添加了 y-axis 标签和标题。

>>> ax = table.plot()
>>> ax.set_title('VORP vs Season')
>>> ax.set_ylabel('VORP')
>>> plt.show()

这是结果，毫无疑问，您的完整数据集看起来会更好。

Python - reading from CSV - ValueError: x and y must have same first dimension

Python - reading from CSV - ValueError: x and y must have same first dimension

csv

numpy

matplotlib