基于 pandas 数据框向散点图添加过滤器

Question

我假装用以下数据框的过滤器做散点图（代表整个赛季的球员、球队和赛季，并计算篮球队球员的助攻和非助攻点数做了）：

player          team_name       season          assisted    notassisted
A. DANRIDGE     NACIONAL        Season_17_18    130         445
A. DANRIDGE     NACIONAL        Season_18_19    132         382
D. ROBINSON     TROUVILLE       Season_18_19    89          286
D. DAVIS        AGUADA          Season_18_19    101         281
E. BATISTA      WELCOME         Season_17_18    148         278
F. MARTINEZ     GOES            Season_18_19    52          259
D. ALVAREZ      AGUADA          Season_17_18    114         246
M. HICKS        H. MACABI       Season_17_18    140         245

我想在 x 轴上放置辅助点，在 y 轴上放置非辅助点。但我也想按赛季、球队和球员进行过滤，所以当我 select 一个坚定的球队球员时，我可以看到他们的分数是一种颜色，其他分数是灰色的，或者例如，如果我想select 两个或更多玩家我可以比较他们（不同颜色），其他点可见但变灰。我也想比较两个不同球队的球员和过滤器的组合。

我正在学习数据科学，并且借助 plotly express 库，我可以制作散点图并按球队进行过滤，我可以比较两个不同的球队（或赛季或球员）。

但我无法以奇特的方式添加多个过滤器，而且我也不知道如何显示 selected 并将其他过滤器设为灰色（而不会使它们消失）。

代码如下：

import plotly.express as px

fig = px.scatter(pointsperplayer, x='assisted', y='notassisted', hover_name='player', 
                 hover_data=['team_name','season'], color='season')
fig.show()

图形结果如下：

Scatter plot resultant

总而言之，我想要三个过滤器，一个用于赛季，一个用于团队，另一个用于球员，每个过滤器中能够有多个 selections，并获得不同的颜色其余点为灰色，因此我可以将结果与其余点进行比较，我不确定是否可以使用 plotly express 或者我是否应该使用不同的库。

Answer 1

所以我无法操纵图例，但我可以通过我发现的下拉小部件添加过滤器 here。根据您的 IDE，您可能需要使用 Jupyter 来让小部件工作。我运行遇到 VSCode 无法显示小部件的问题。我在下面拥有的是按球队名称、赛季或球员进行过滤并比较该过滤器中的两个选项的能力。我希望这可以扩展以满足您的需求。

import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import ipywidgets as ipy
from ipywidgets import Output, VBox, widgets


# First gather the data I need and choose the display colors
playerData = pd.read_csv("playerData.csv")
teamNames = list(playerData['team_name'].unique().tolist());
seasons = list(playerData['season'].unique().tolist());
players = list(playerData['player'].unique().tolist());
color1 = 'red'
color2 = 'blue'
color3 = 'gray'

# This creates the initial figure.
# Note that px.scatter generates multiple scatter plot 'traces'. Each trace contains 
# the data points associated with 1 team/season/player depending on what the property
# of 'color' is set to.
trace1 = px.scatter(playerData, x='assisted', y='notassisted', color='team_name')
fig = go.FigureWidget(trace1)

# Create all our drop down widgets
filterDrop = widgets.Dropdown(
    description='Filter:',
    value='team_name',
    options=['team_name', 'season','player']  
)
teamDrop1 = widgets.Dropdown(
    description='Team Name:',
    value='NACIONAL',
    options=list(playerData['team_name'].unique().tolist())  
)
teamDrop2 = widgets.Dropdown(
    description='Team Name:',
    value='NACIONAL',
    options=list(playerData['team_name'].unique().tolist())  
)
playerDrop1 = widgets.Dropdown(
    description='Player:',
    value='A. DANRIDGE',
    options=list(playerData['player'].unique().tolist())  
)
playerDrop2 = widgets.Dropdown(
    description='Player:',
    value='A. DANRIDGE',
    options=list(playerData['player'].unique().tolist())  
)
seasonDrop1 = widgets.Dropdown(
    description='Season:',
    value='Season_17_18',
    options=list(playerData['season'].unique().tolist())  
)
seasonDrop2 = widgets.Dropdown(
    description='Season:',
    value='Season_17_18',
    options=list(playerData['season'].unique().tolist())  
)

# This will be called when the filter dropdown changes. 
def filterResponse(change):
    # generate the new traces that are filtered by teamname, season, or player
    tempTrace = px.scatter(playerData, x='assisted', y='notassisted', color=filterDrop.value)
    with fig.batch_update():
        # Delete the old traces and add the new traces in one at a time
        fig.data = []
        for tr in tempTrace.data:
            fig.add_scatter(x = tr.x, y = tr.y, hoverlabel = tr.hoverlabel, hovertemplate = tr.hovertemplate, \
                           legendgroup = tr.legendgroup, marker = tr.marker, mode = tr.mode, name = tr.name)
    # Call response so that it will color the markers appropriately
    response(change)

# This is called by all the other drop downs
def response(change):
    # colorList is a list of strings the length of the # of traces 
    if filterDrop.value == 'team_name':
        colorList = [color1 if x == teamDrop1.value else color2 if x == teamDrop2.value else color3 for x in teamNames]
    elif filterDrop.value == 'season':
        colorList = [color1 if x == seasonDrop1.value else color2 if x == seasonDrop2.value else color3 for x in seasons]
    else:
        colorList = [color1 if x == playerDrop1.value else color2 if x == playerDrop2.value else color3 for x in players]
    with fig.batch_update():
        # Color each trace according to our chosen comparison traces
        for i in range(len(colorList)):
            fig.data[i].marker.color = colorList[i]

# These determine what function should be called when a drop down changes
teamDrop1.observe(response, names="value")
seasonDrop1.observe(response, names="value")
playerDrop1.observe(response, names="value")
teamDrop2.observe(response, names="value")
seasonDrop2.observe(response, names="value")
playerDrop2.observe(response, names="value")
filterDrop.observe(filterResponse, names="value")

# HBox and VBox are used to organize the other widgets and figures
container1 = widgets.HBox([filterDrop]) 
container2 = widgets.HBox([teamDrop1, seasonDrop1, playerDrop1])
container3 = widgets.HBox([teamDrop2, seasonDrop2, playerDrop2])
widgets.VBox([container1, container2, container3, fig])

结果如下所示：

基于 pandas 数据框向散点图添加过滤器

Add filters to scatter plot based on a pandas dataframe

python

scatter-plot

dataframe