如何使用“matplotlib.pyplot.scatter”创建数据散点图

How to create a scatterplot of data using `matplotlib.pyplot.scatter`

我对 matplotlib.pyplot.scatter 有疑问。

首先,我需要下载虹膜分类的数据,贴上头条。

        import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    
    plt.style.use('seaborn')
    
    %matplotlib inline
    
    df = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data', header = None)
    df_names = ['sepal length in cm', 'sepal width in cm', 'petal length in cm', 'petal width in cm', 'class']
    df.columns = df_names
    df

其次,我应该按以下方式使用 matplotlib.pyplot.scatter 创建数据散点图:

    * for x and y coordinates use sepal length and width respectively
    * for size use the petal length
    * for alpha (opacity/transparency) use the petal width
    * illustrate iris belonging to each class by using 3 distinct colours (RGB for instance, but be creative if you want)
    * *some columns will need to be scaled, to be passed as parameters; you might also want to scale some other columns to increase the readability of the illustration.

然后,我找到了这个网站:https://www.geeksforgeeks.org/matplotlib-pyplot-scatter-in-python/

之后,我将他们的草稿用于我的任务:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

plt.style.use('seaborn')

%matplotlib inline

# dataset-df

x1 = [4.3, 7.9, 5.84, 0.83, 0.7826]

y1 = [2.0, 4.4, 3.05, 0.43, -0.4194]
 
plt.scatter(x1, y1, c ="red",
            alpha = 1.0, 6.9, 3.76, 1.76, 0.9490,
            linewidth = 2,
            marker ="s",
            s = [1.0, 6.9, 3.76, 1.76, 0.9490])
  
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()

但是,我总是得到这个错误:

File "C:\Users\felix\AppData\Local\Temp/ipykernel_32284/4113309647.py", line 21
    s = [1.0, 6.9, 3.76, 1.76, 0.9490])
                                      ^
SyntaxError: positional argument follows keyword argument

你能告诉我如何解决这个问题并完成我的任务吗?

另外,我从iris.names复制了数据:

1. Title: Iris Plants Database
    Updated Sept 21 by C.Blake - Added discrepency information

2. Sources:
     (a) Creator: R.A. Fisher
     (b) Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
     (c) Date: July, 1988

3. Past Usage:
   - Publications: too many to mention!!!  Here are a few.
   1. Fisher,R.A. "The use of multiple measurements in taxonomic problems"
      Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions
      to Mathematical Statistics" (John Wiley, NY, 1950).
   2. Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis.
      (Q327.D83) John Wiley & Sons.  ISBN 0-471-22361-1.  See page 218.
   3. Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System
      Structure and Classification Rule for Recognition in Partially Exposed
      Environments".  IEEE Transactions on Pattern Analysis and Machine
      Intelligence, Vol. PAMI-2, No. 1, 67-71.
      -- Results:
         -- very low misclassification rates (0% for the setosa class)
   4. Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule".  IEEE 
      Transactions on Information Theory, May 1972, 431-433.
      -- Results:
         -- very low misclassification rates again
   5. See also: 1988 MLC Proceedings, 54-64.  Cheeseman et al's AUTOCLASS II
      conceptual clustering system finds 3 classes in the data.

4. Relevant Information:
   --- This is perhaps the best known database to be found in the pattern
       recognition literature.  Fisher's paper is a classic in the field
       and is referenced frequently to this day.  (See Duda & Hart, for
       example.)  The data set contains 3 classes of 50 instances each,
       where each class refers to a type of iris plant.  One class is
       linearly separable from the other 2; the latter are NOT linearly
       separable from each other.
   --- Predicted attribute: class of iris plant.
   --- This is an exceedingly simple domain.
   --- This data differs from the data presented in Fishers article
    (identified by Steve Chadwick,  spchadwick@espeedaz.net )
    The 35th sample should be: 4.9,3.1,1.5,0.2,"Iris-setosa"
    where the error is in the fourth feature.
    The 38th sample: 4.9,3.6,1.4,0.1,"Iris-setosa"
    where the errors are in the second and third features.  

5. Number of Instances: 150 (50 in each of three classes)

6. Number of Attributes: 4 numeric, predictive attributes and the class

7. Attribute Information:
   1. sepal length in cm
   2. sepal width in cm
   3. petal length in cm
   4. petal width in cm
   5. class: 
      -- Iris Setosa
      -- Iris Versicolour
      -- Iris Virginica

8. Missing Attribute Values: None

Summary Statistics:
             Min  Max   Mean    SD   Class Correlation
   sepal length: 4.3  7.9   5.84  0.83    0.7826   
    sepal width: 2.0  4.4   3.05  0.43   -0.4194
   petal length: 1.0  6.9   3.76  1.76    0.9490  (high!)
    petal width: 0.1  2.5   1.20  0.76    0.9565  (high!)

9. Class Distribution: 33.3% for each of 3 classes.

iris 数据集没有问题,只是您在散点函数中定义了 alpha 参数的部分。您应该按照您的方式更改为参数赋值的方式:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

plt.style.use('seaborn')

%matplotlib inline

# dataset-df

x1 = [4.3, 7.9, 5.84, 0.83, 0.7826]

y1 = [2.0, 4.4, 3.05, 0.43, -0.4194]
 
plt.scatter(x1, y1, c ="red",alpha = 1,
            linewidth = 2,
            marker ="s",
            s = [1.0, 6.9, 3.76, 1.76, 0.9490])
  
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()

请注意,alpha 只取一个数字,可能是 0.90.8 甚至 0.823425,而不是列表或其他任何内容。

输出