如何在给定转移概率矩阵的情况下生成随机序列?
How to generate a random sequence given a probability matrix of transitions?
下面的脚本生成给定列表的概率矩阵:
transitions = ['A', 'B', 'B', 'C', 'B', 'A', 'D', 'D', 'A', 'B', 'A', 'D']
def rank(c):
return ord(c) - ord('A')
T = [rank(c) for c in transitions]
#create matrix of zeros
M = [[0]*4 for _ in range(4)]
for (i,j) in zip(T,T[1:]):
M[i][j] += 1
#now convert to probabilities:
for row in M:
n = sum(row)
if n > 0:
row[:] = [f/sum(row) for f in row]
#print M:
for row in M:
print(row)
输出
[0.0, 0.5, 0.0, 0.5]
[0.5, 0.25, 0.25, 0.0]
[0.0, 1.0, 0.0, 0.0]
[0.5, 0.0, 0.0, 0.5]
我现在想反其道而行之,按照概率矩阵做一个新的A B C D的转换列表。
我怎样才能做到这一点?
随机库的choices
函数可能会有帮助。由于问题没有说明如何选择第一个字母,所以这里选择与原始列表内容相同的概率。
因为 Python 3.6 random.choices
接受带有权重的参数。对它们进行归一化并不是绝对必要的。
import random
letter = random.choice(transitions) # take a starting letter with the same weights as the original list
new_list = [letter]
for _ in range(len(transitions) - 1):
letter = chr(random.choices(range(4), weights=M[rank(letter)])[0] + ord('A'))
new_list.append(letter)
print(new_list)
完整的代码可以在某种程度上被概括为适用于任何类型的节点,而不仅仅是连续的字母:
from _collections import defaultdict
import random
transitions = ['A', 'B', 'B', 'C', 'B', 'A', 'D', 'D', 'A', 'B', 'A', 'D']
nodes = sorted(set(transitions)) # a list of all letters used
M = defaultdict(int) # dictionary counting the occurrences for each transition i,j)
for (i, j) in zip(transitions, transitions[1:]):
M[(i, j)] += 1
# dictionary with for each node a list of frequencies for the transition to a next node
T = {i: [M[(i, j)] for j in nodes] for i in nodes}
# node = random.choice(transitions) # chose the first node randomly with the same probability as the original list
node = random.choice(nodes) # chose the first node randomly, each node with equal probability
new_list = [node]
for _ in range(9):
node = random.choices(nodes, T[node])[0]
new_list.append(node)
print(new_list)
示例输出:['D', 'A', 'D', 'A', 'D', 'D', 'A', 'D', 'A', 'B']
在我看来,您正在尝试创建马尔可夫模型。
作为一名生物信息学学生,我碰巧对(隐)马尔可夫模型有一些经验,因此我会使用嵌套字典来简化矩阵的处理。请注意,我已经导入了 numpy.random
函数。
希望对您有所帮助!
import numpy.random as rnd
alphabet = ['A', 'B', 'C', 'D']
transitions = ['A', 'B', 'B', 'C', 'B', 'A', 'D', 'D', 'A', 'B', 'A', 'D']
# Create probability matrix filled with zeroes
# Matrix consists of nested libraries
prob_matrix = {}
for i in alphabet:
prob_matrix[i] = {}
for j in alphabet:
prob_matrix[i][j] = 0.0
def rank(c):
return ord(c) - ord('A')
# fill matrix with numbers based on transitions list
T = [rank(c) for c in transitions]
for (i,j) in zip(T,T[1:]):
prob_matrix[alphabet[i]][alphabet[j]] += 1
# convert to probabilities
for row in prob_matrix:
total = sum([prob_matrix[row][column] for column in prob_matrix[row]])
if total > 0:
for column in prob_matrix[row]:
prob_matrix[row][column] /= total
# generate first random sequence letter
outputseq = rnd.choice(alphabet, None)
# generate rest of string based on probability matrix
for i in range(11):
probabilities = [prob_matrix[outputseq[-1]][j] for j in alphabet]
outputseq += rnd.choice(alphabet, None, False, probabilities)
# output generated sequence
print(outputseq)
下面的脚本生成给定列表的概率矩阵:
transitions = ['A', 'B', 'B', 'C', 'B', 'A', 'D', 'D', 'A', 'B', 'A', 'D']
def rank(c):
return ord(c) - ord('A')
T = [rank(c) for c in transitions]
#create matrix of zeros
M = [[0]*4 for _ in range(4)]
for (i,j) in zip(T,T[1:]):
M[i][j] += 1
#now convert to probabilities:
for row in M:
n = sum(row)
if n > 0:
row[:] = [f/sum(row) for f in row]
#print M:
for row in M:
print(row)
输出
[0.0, 0.5, 0.0, 0.5]
[0.5, 0.25, 0.25, 0.0]
[0.0, 1.0, 0.0, 0.0]
[0.5, 0.0, 0.0, 0.5]
我现在想反其道而行之,按照概率矩阵做一个新的A B C D的转换列表。
我怎样才能做到这一点?
随机库的choices
函数可能会有帮助。由于问题没有说明如何选择第一个字母,所以这里选择与原始列表内容相同的概率。
因为 Python 3.6 random.choices
接受带有权重的参数。对它们进行归一化并不是绝对必要的。
import random
letter = random.choice(transitions) # take a starting letter with the same weights as the original list
new_list = [letter]
for _ in range(len(transitions) - 1):
letter = chr(random.choices(range(4), weights=M[rank(letter)])[0] + ord('A'))
new_list.append(letter)
print(new_list)
完整的代码可以在某种程度上被概括为适用于任何类型的节点,而不仅仅是连续的字母:
from _collections import defaultdict
import random
transitions = ['A', 'B', 'B', 'C', 'B', 'A', 'D', 'D', 'A', 'B', 'A', 'D']
nodes = sorted(set(transitions)) # a list of all letters used
M = defaultdict(int) # dictionary counting the occurrences for each transition i,j)
for (i, j) in zip(transitions, transitions[1:]):
M[(i, j)] += 1
# dictionary with for each node a list of frequencies for the transition to a next node
T = {i: [M[(i, j)] for j in nodes] for i in nodes}
# node = random.choice(transitions) # chose the first node randomly with the same probability as the original list
node = random.choice(nodes) # chose the first node randomly, each node with equal probability
new_list = [node]
for _ in range(9):
node = random.choices(nodes, T[node])[0]
new_list.append(node)
print(new_list)
示例输出:['D', 'A', 'D', 'A', 'D', 'D', 'A', 'D', 'A', 'B']
在我看来,您正在尝试创建马尔可夫模型。
作为一名生物信息学学生,我碰巧对(隐)马尔可夫模型有一些经验,因此我会使用嵌套字典来简化矩阵的处理。请注意,我已经导入了 numpy.random
函数。
希望对您有所帮助!
import numpy.random as rnd
alphabet = ['A', 'B', 'C', 'D']
transitions = ['A', 'B', 'B', 'C', 'B', 'A', 'D', 'D', 'A', 'B', 'A', 'D']
# Create probability matrix filled with zeroes
# Matrix consists of nested libraries
prob_matrix = {}
for i in alphabet:
prob_matrix[i] = {}
for j in alphabet:
prob_matrix[i][j] = 0.0
def rank(c):
return ord(c) - ord('A')
# fill matrix with numbers based on transitions list
T = [rank(c) for c in transitions]
for (i,j) in zip(T,T[1:]):
prob_matrix[alphabet[i]][alphabet[j]] += 1
# convert to probabilities
for row in prob_matrix:
total = sum([prob_matrix[row][column] for column in prob_matrix[row]])
if total > 0:
for column in prob_matrix[row]:
prob_matrix[row][column] /= total
# generate first random sequence letter
outputseq = rnd.choice(alphabet, None)
# generate rest of string based on probability matrix
for i in range(11):
probabilities = [prob_matrix[outputseq[-1]][j] for j in alphabet]
outputseq += rnd.choice(alphabet, None, False, probabilities)
# output generated sequence
print(outputseq)