每当我在 str 中遇到某个单词时创建一个列表

Create a list everytime I encounter a certain word in a str

我的问题是我想写一个代码来做到这一点:

    input => str_of_words = '<post>30blueyellow<post>2skyearth<post>5summerwinter'
    output => post30 = ["blue","yellow"]
              post2 = ["sky","earth"]
              post5 = ["summer", "winter"]

起初我以为我可以做类似

的事情
     if "<post>" in str_of_words:
         occurrence = str_of_words.count("<post>")
         #and from there I had no idea how to continue coding it

所以我觉得我可以问问是否有人知道这样做的一些技巧

这可能会让您入门:

import re

str_of_words = '<post>30blueyellow<post>2skyearth<post>5summerwinter'

posts = {}
lst = str_of_words.split('<post>')
for item in lst:
    match = re.match('(\d+)(\D+)', item)
    if not match:
        continue
    posts[int(match.group(1))] = match.group(2)

print(posts)

它打印:

{30: 'blueyellow', 2: 'skyearth', 5: 'summerwinter'}

所以posts[30] = 'blueyellow'.

re module 在将数字 (\d) 与非数字 (\D) 分开时非常有用。

我不知道你希望能够按照什么规则拆分单词。你有可能出现的单词列表吗?

您可以使用 nltk 模块:

import re
import nltk
nltk.download('words')
from nltk.corpus import words

def split(a):
    for i in range(len(a)):
        if a[:i] in words.words() and a[i:] in words.words():
            return [a[:i],a[i:]]


str_of_words = '<post>30blueyellow<post>2skyearth<post>5summerwinter'

post = {i:split(j) for i,j in dict(re.findall(r'post>(\d+)(\w+)',str_of_words)).items()}

post['30']
 ['blue', 'yellow']

post['5']
 ['summer', 'winter']

post['2']
 ['sky', 'earth']