xlabel 和 ylabel 值未在 matplotlib 散点图中排序

Question

我在互联网上进行了大量繁琐的搜索，似乎无法弄清楚如何提出正确的问题以获得我想做的事情的答案。

我正在尝试创建一个散点图，y 轴为 P/E 比率，x 轴为 股息收益率 -轴。我将数据放入 CSV 文件，然后将每一列作为单独的列表导入 Python。

下面是我的散点图结果。我很困惑为什么 x 轴和 y 轴没有按数字排序。我想我必须将列表中的元素变成浮点数，然后在之前将其变成散点图。

我能想到的另一个选择是能够在创建散点图的过程中对值进行排序。

这些都没有解决，我已经走到了死胡同。任何帮助或指向正确方向的帮助将不胜感激，因为我只能描述我的问题，但似乎无法在我的搜索中提出正确的问题。

import csv
import matplotlib.pyplot as plt

etf_data = csv.reader(open('xlv_xlu_combined_td.csv', 'r'))

for i, row in etf_data.iterrows():
    symbol.append(row[0])
    index.append(row[1])
    dividend.append(row[2])
    pe.append(row[3])

symbol.pop(0)
index.pop(0)
dividend.pop(0)
pe.pop(0)

indexes = [i.split('%', 1)[0] for i in index]
dividend_yield = [d.split('%', 1)[0] for d in dividend]
pe_ratio = [p.split('X', 1)[0] for p in pe]

x = dividend_yield[:5]
y = pe_ratio[:5]

plt.scatter(x, y, label='Healthcare P/E & Dividend', alpha=0.5)
plt.xlabel('Dividend yield')
plt.ylabel('Pe ratio')
plt.legend()
plt.show()

xlv_xlu_combined_td.csv

symbol,index,dividend,pe
JNJ,10.11%,2.81%,263.00X
UNH,7.27%,1.40%,21.93X
PFE,6.48%,3.62%,10.19X
MRK,4.96%,3.06%,104.92X
ABBV,4.43%,4.01%,23.86X
AMGN,3.86%,2.72%,60.93X
MDT,3.50%,2.27%,38.10X
ABT,3.26%,1.78%,231.74X
GILD,2.95%,2.93%,28.69X
BMY,2.72%,2.81%,97.81X
TMO,2.55%,0.32%,36.98X
LLY,2.49%,2.53%,81.83X

Answer 1

import matplotlib.pyplot as plt

#arrays (X,Y) from your csv file with all of your data
x = [<some values>]
y = [<some values>]

plt.scatter(X,Y)

这会给你一个图，其中每个点的坐标是

(x[i],y[i])

据我所知，它不会在绘图前自动为您排序数据。如果你想要排序的数据，你必须先做类似

的事情

x.sort()
y.sort()

然后将它们存储在一个新变量中，然后将其放入分散函数中。

我看到的另一个问题是，在您的散点图中，X 轴和 Y 轴标签的顺序不正确。我以前从未见过这个，我不确定为什么会这样。您能否提供一些代码来诊断为什么会发生这种情况？

Answer 2

您需要将字符串转换为数字。 Matplotlib 将字符串视为“类别”，并按照您提供的顺序绘制它们。

Answer 3

我没有足够的代表来回复关于 OP 对乔迪评论的回应的评论，但我想补充一点，这确实解决了我的问题，但如果你遇到与我相同的问题您的数据框中有多种类型，请使用以下格式仅转换一列：

df["colName"] = pd.to_numeric(df["colName"])

希望这对某人有所帮助

Answer 4

问题是值是 string 类型，因此它们是按照列表中给定的顺序绘制的，而不是数字顺序。
值必须从末尾删除符号，然后转换为数字类型。

使用 `csv` 模块

添加到现有代码

根据现有代码，很容易将列表中的值 map() 转换为 float 类型。

indexes = [i.split('%', 1)[0] for i in index]
dividend_yield = [d.split('%', 1)[0] for d in dividend]
pe_ratio = [p.split('X', 1)[0] for p in pe]

# add mapping values to floats after removing the symbols from the values
indexes = list(map(float, indexes))
dividend_yield = list(map(float, dividend_yield))
pe_ratio = list(map(float, pe_ratio))

# plot
x = dividend_yield[:5]
y = pe_ratio[:5]

plt.scatter(x, y, label='Healthcare P/E & Dividend', alpha=0.5)
plt.xlabel('Dividend yield')
plt.ylabel('Pe ratio')
plt.legend(bbox_to_anchor=(1, 1), loc='upper left')
plt.show()

使用`pandas`

从 col.str[:-1]
使用 .astype(float)

float

使用 pandas v1.2.4 和 matplotlib v3.3.4
此选项将所需代码从 23 行减少到 4 行。

import pandas as pd

# read the file
df = pd.read_csv('xlv_xlu_combined_td.csv')

# remove the symbols from the end of the number and set the columns to float type
df.iloc[:, 1:] = df.iloc[:, 1:].apply(lambda col: col.str[:-1]).astype(float)

# plot the first five rows of the two columns
ax = df.iloc[:5, 2:].plot(x='dividend', y='pe', kind='scatter', alpha=0.5,
                          ylabel='Dividend yield', xlabel='Pe ratio',
                          label='Healthcare P/E & Dividend')
ax.legend(bbox_to_anchor=(1, 1), loc='upper left')

绘制两种实现的输出

请注意现在数字的顺序是正确的。

xlabel 和 ylabel 值未在 matplotlib 散点图中排序

xlabel and ylabel values are not sorted in matplotlib scatterplot

python

matplotlib

scatter-plot

使用 `csv` 模块

使用`pandas`

绘制两种实现的输出

xlabel 和 ylabel 值未在 matplotlib 散点图中排序

xlabel and ylabel values are not sorted in matplotlib scatterplot

python

matplotlib

scatter-plot

使用 csv 模块

使用pandas

绘制两种实现的输出

使用 `csv` 模块

使用`pandas`