在笔记本中获取 Bokeh 的选择

Get Bokeh's selection in notebook

我想 select 绘图上的一些点(例如来自 box_selectlasso_select)并在 Jupyter 笔记本中检索它们以进行进一步的数据探索。我该怎么做?

例如,在下面的代码中,如何将 selection 从 Bokeh 导出到笔记本中?如果我需要一个 Bokeh 服务器,这也很好(我在 docs 中看到我可以添加 "two-way communication" 一个服务器,但未能调整示例以达到我的目标)。

from random import random
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
from bokeh.models.sources import ColumnDataSource

output_notebook()

x = [random() for x in range(1000)]
y = [random() for y in range(1000)]

s = ColumnDataSource(data=dict(x=x, y=y))
fig = figure(tools=['box_select', 'lasso_select', 'reset'])
fig.circle("x", "y", source=s, alpha=0.6)

show(fig)
# Select on the plot
# Get selection in a ColumnDataSource, or index list, or pandas object, or etc.?

注释

为了 select 绘图上的一些点并在 Jupyter notebook 中检索它们,您可以使用 CustomJS callback.

在 CustomJS 回调 javascript 代码中,您可以使用 IPython.notebook.kernel 访问 Jupyter notebook 内核。然后,您可以使用 kernal.execute(python_code) 到 运行 Python 代码和(例如)从 javascript 调用导出数据到 Jupyter notebook。

因此,散景图和 Jupyter notebook 之间的双向通信不需要散景服务器。

下面,我扩展了您的示例代码以包含一个 CustomJS 回调,该回调触发图中的 selection 几何事件。每当创建 selection 时,回调 运行s 并将 selected 数据的索引导出到 Jupyter notebook 中名为 selected_indices 的变量。

为了获得包含 selected 数据点的 ColumnDataSource,循环遍历 selected_indices 元组以创建 selected x 和 y 值的列表,然后传递给 ColumnDataSource 构造函数。

from random import random
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
from bokeh.models.sources import ColumnDataSource
from bokeh.models.callbacks import CustomJS

output_notebook()

x = [random() for x in range(1000)]
y = [random() for y in range(1000)]

s = ColumnDataSource(data=dict(x=x, y=y))

fig = figure(tools=['box_select', 'lasso_select', 'reset'])
fig.circle("x", "y", source=s, alpha=0.6)

# make a custom javascript callback that exports the indices of the selected points to the Jupyter notebook
callback = CustomJS(args=dict(s=s), 
                    code="""
                         console.log('Running CustomJS callback now.');
                         var indices = s.selected.indices;
                         var kernel = IPython.notebook.kernel;
                         kernel.execute("selected_indices = " + indices)
                         """)

# set the callback to run when a selection geometry event occurs in the figure
fig.js_on_event('selectiongeometry', callback)

show(fig)
# make a selection using a selection tool 

# inspect the selected indices
selected_indices

# use the indices to create lists of the selected values
x_selected, y_selected = [], []
for indice in selected_indices:
    x_val = s.data['x'][indice]
    y_val = s.data['y'][indice]
    x_selected.append(x_val)
    y_selected.append(y_val)

# make a column data souce containing the selected values
selected = ColumnDataSource(data=dict(x=x_selected, y=y_selected))

# inspect the selected data
selected.data

如果您有散景服务器 运行,您可以通过 datasource.selection.indices 访问数据源的选择索引。以下是您如何执行此操作的示例(从官方 Embed a Bokeh Server Into Jupyter 示例修改而来):

from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from bokeh.io import show, output_notebook

from bokeh.sampledata.sea_surface_temperature import sea_surface_temperature

output_notebook()

df = sea_surface_temperature.copy()[:100]
source = ColumnDataSource(data=df)

def bkapp(doc):

    plot = figure(x_axis_type='datetime', y_range=(0, 25), tools="lasso_select",
                  y_axis_label='Temperature (Celsius)',
                  title="Sea Surface Temperature at 43.18, -70.43")
    plot.circle('time', 'temperature', source=source)

    doc.add_root( plot)

show(bkapp)

选中后,可以得到选中的数据如下:

selected_data = df.iloc[source.selected.indices]
print(selected_data)

应该会显示所选值。

虽然超出了这个问题的范围,但请注意 jupyter notebook 与 bokeh 应用程序的交互性质之间存在脱节:此解决方案引入了 jupyter notebook 未保存的状态,因此重新启动它并执行所有单元格没有给出相同的结果。解决这个问题的一种方法是用 pickle 坚持选择:

df = sea_surface_temperature.copy()[:100]
source = ColumnDataSource(data=df)
if os.path.isfile("selection.pickle"):
    with open("selection.pickle", mode="rb") as f:
        source.selected.indices = pickle.load(f)

... # interactive part

with open("selection.pickle", mode="wb") as f:
    pickle.dump(source.selected.indices, f)