根据 python 中的转换矩阵权重选择单词

Question

我正在尝试 select 基于当前单词的可能的下一个单词，使用之前出现的单词对作为 "weights"。我在实际选择下一个词时遇到了 np.random.choice() 的问题。

import pandas as pd
import numpy as np

texty = "won't you celebrate with me what i have shaped into a kind of life i had no model born in babylon both nonwhite and woman what did i see to be except myself i made it up here on this bridge between starshine and clay my one hand holding tight my other hand come celebrate with me that everyday
something has tried to kill me and has failed." 

# https://www.poetryfoundation.org/poems/50974/wont-you-celebrate-with-me

words = texty.split()

# Creating the text-based transition matrix

x = pd.crosstab(pd.Series(words[1:],name='next'),
            pd.Series(words[:-1],name='word'),normalize=1)

print(x)

# Selecting the next word based on the current word.
# https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.choice.html

current = "and"

# this part isn't working--->
next = np.random.choice(current,1,current) # was "y"

我不知道如何从这里引用转换矩阵。我希望这个选择是基于以前发生的概率。例如，"clay"跟随"and"的概率是33%。

Answer 1

x 是一个 Pandas DataFrame.

您可以访问该 DataFrame 的任何列，就好像列名是字典中的键一样。

> print(x['won\'t'])
next
a            0.0
and          0.0
babylon      0.0
...
with         0.0
woman        0.0
you          1.0
Name: won't, dtype: float64

列 returns 作为 Pandas Series。如果您 select DataFrame 中的一列（您的转换矩阵 x），您 select 的系列的 index 将是文本中的可用词，并且 values 将是它们的相关概率。您可以将这些中的每一个提供给 np.random.choice 以获取下一个单词，概率从您的转换矩阵中加权。

> current_word = 'won\'t'
> current_column = x[current_word]
> next_word = np.random.choice(current_column.index,
                 p=current_column.values)
> print(next_word)
you

根据 python 中的转换矩阵权重选择单词

Selecting a word based on transition matrix weights in python

python

text

transition

numpy

prediction