Python - reading from CSV - ValueError: x and y must have same first dimension
Python - reading from CSV - ValueError: x and y must have same first dimension
我开始使用 Python 和 Anaconda。我正在尝试创建一个线图,类似于我使用 R 成功生成的线图。当我尝试使用下面的代码尝试读取 csv 文件时,出现错误 ValueError: x and y must have same first dimension
import csv
import matplotlib as mpl
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cbook as cbook
def getColumn(filename, column):
results = csv.reader(open(filename), delimiter="\t")
return [result[column] for result in results if len(result) > column]
Season = getColumn("vs.csv",0)
VORP = getColumn("vs.csv",2)
fig = plt.figure()
plt.figure("VORP vs Season")
plt.xlabel("Season")
plt.ylabel("VORP")
plt.legend(["PlayerA","PlayerB"], loc=9,ncol=2)
plt.plot(Season, VORP)
plt.show()
CSV 文件仅包含以下条目:
Season Player VORP
'0405' PlayerA .7
'0506' PlayerA .14
[and so on]
'0405' PlayerB .23
'0506' PlayerB -.3
[and so on]
一种解决方案是使用 pandas
data analysis library that comes with Anaconda. It is supposed to provide some of the same functionality as R, so it may be a good option for you. And it greatly simplifies importing and manipulating data from csv files. It also has nice plotting capabilities, which use matplotlib
.
首先,导入 pandas
和 matplotlib.pyplot
,然后使用前者从您的 csv 创建 pandas.DataFrame
object。如果将 DataFrame 打印到控制台,您会发现它看起来非常漂亮。
>>> import pandas as pd
>>> import matplotlib.pyplot as plt
>>>
>>> df = pd.DataFrame.from_csv('vorp.csv', index_col=None)
>>> print df
Season Player VORP
0 '0405' PlayerA 0.70
1 '0506' PlayerA 0.14
2 '0405' PlayerB 0.23
3 '0506' PlayerB -0.30
现在调用 DataFrame 上的 pivot_table
方法。这只是 return 另一个 DataFrame object,但它的组织方式将使其易于绘制。您需要将 'VORP' 设置为值,将 'Season' 设置为索引(即行),并将 'Player' 设置为列,如下所示:
>>> table = df.pivot_table('VORP', 'Season', 'Player')
>>> print table
Player PlayerA PlayerB
Season
'0405' 0.70 0.23
'0506' 0.14 -0.30
现在只需要绘制 table 了。只需调用 plot
method on your pivot table (which will return a matplotlib.axes
object),然后使用 matplotlib
随意操作它。例如,我添加了 y-axis 标签和标题。
>>> ax = table.plot()
>>> ax.set_title('VORP vs Season')
>>> ax.set_ylabel('VORP')
>>> plt.show()
这是结果,毫无疑问,您的完整数据集看起来会更好。
我开始使用 Python 和 Anaconda。我正在尝试创建一个线图,类似于我使用 R 成功生成的线图。当我尝试使用下面的代码尝试读取 csv 文件时,出现错误 ValueError: x and y must have same first dimension
import csv
import matplotlib as mpl
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cbook as cbook
def getColumn(filename, column):
results = csv.reader(open(filename), delimiter="\t")
return [result[column] for result in results if len(result) > column]
Season = getColumn("vs.csv",0)
VORP = getColumn("vs.csv",2)
fig = plt.figure()
plt.figure("VORP vs Season")
plt.xlabel("Season")
plt.ylabel("VORP")
plt.legend(["PlayerA","PlayerB"], loc=9,ncol=2)
plt.plot(Season, VORP)
plt.show()
CSV 文件仅包含以下条目:
Season Player VORP
'0405' PlayerA .7
'0506' PlayerA .14
[and so on]
'0405' PlayerB .23
'0506' PlayerB -.3
[and so on]
一种解决方案是使用 pandas
data analysis library that comes with Anaconda. It is supposed to provide some of the same functionality as R, so it may be a good option for you. And it greatly simplifies importing and manipulating data from csv files. It also has nice plotting capabilities, which use matplotlib
.
首先,导入 pandas
和 matplotlib.pyplot
,然后使用前者从您的 csv 创建 pandas.DataFrame
object。如果将 DataFrame 打印到控制台,您会发现它看起来非常漂亮。
>>> import pandas as pd
>>> import matplotlib.pyplot as plt
>>>
>>> df = pd.DataFrame.from_csv('vorp.csv', index_col=None)
>>> print df
Season Player VORP
0 '0405' PlayerA 0.70
1 '0506' PlayerA 0.14
2 '0405' PlayerB 0.23
3 '0506' PlayerB -0.30
现在调用 DataFrame 上的 pivot_table
方法。这只是 return 另一个 DataFrame object,但它的组织方式将使其易于绘制。您需要将 'VORP' 设置为值,将 'Season' 设置为索引(即行),并将 'Player' 设置为列,如下所示:
>>> table = df.pivot_table('VORP', 'Season', 'Player')
>>> print table
Player PlayerA PlayerB
Season
'0405' 0.70 0.23
'0506' 0.14 -0.30
现在只需要绘制 table 了。只需调用 plot
method on your pivot table (which will return a matplotlib.axes
object),然后使用 matplotlib
随意操作它。例如,我添加了 y-axis 标签和标题。
>>> ax = table.plot()
>>> ax.set_title('VORP vs Season')
>>> ax.set_ylabel('VORP')
>>> plt.show()
这是结果,毫无疑问,您的完整数据集看起来会更好。