每当我在 str 中遇到某个单词时创建一个列表
Create a list everytime I encounter a certain word in a str
我的问题是我想写一个代码来做到这一点:
input => str_of_words = '<post>30blueyellow<post>2skyearth<post>5summerwinter'
output => post30 = ["blue","yellow"]
post2 = ["sky","earth"]
post5 = ["summer", "winter"]
起初我以为我可以做类似
的事情
if "<post>" in str_of_words:
occurrence = str_of_words.count("<post>")
#and from there I had no idea how to continue coding it
所以我觉得我可以问问是否有人知道这样做的一些技巧
这可能会让您入门:
import re
str_of_words = '<post>30blueyellow<post>2skyearth<post>5summerwinter'
posts = {}
lst = str_of_words.split('<post>')
for item in lst:
match = re.match('(\d+)(\D+)', item)
if not match:
continue
posts[int(match.group(1))] = match.group(2)
print(posts)
它打印:
{30: 'blueyellow', 2: 'skyearth', 5: 'summerwinter'}
所以posts[30] = 'blueyellow'
.
re
module 在将数字 (\d
) 与非数字 (\D
) 分开时非常有用。
我不知道你希望能够按照什么规则拆分单词。你有可能出现的单词列表吗?
您可以使用 nltk
模块:
import re
import nltk
nltk.download('words')
from nltk.corpus import words
def split(a):
for i in range(len(a)):
if a[:i] in words.words() and a[i:] in words.words():
return [a[:i],a[i:]]
str_of_words = '<post>30blueyellow<post>2skyearth<post>5summerwinter'
post = {i:split(j) for i,j in dict(re.findall(r'post>(\d+)(\w+)',str_of_words)).items()}
post['30']
['blue', 'yellow']
post['5']
['summer', 'winter']
post['2']
['sky', 'earth']
我的问题是我想写一个代码来做到这一点:
input => str_of_words = '<post>30blueyellow<post>2skyearth<post>5summerwinter'
output => post30 = ["blue","yellow"]
post2 = ["sky","earth"]
post5 = ["summer", "winter"]
起初我以为我可以做类似
的事情 if "<post>" in str_of_words:
occurrence = str_of_words.count("<post>")
#and from there I had no idea how to continue coding it
所以我觉得我可以问问是否有人知道这样做的一些技巧
这可能会让您入门:
import re
str_of_words = '<post>30blueyellow<post>2skyearth<post>5summerwinter'
posts = {}
lst = str_of_words.split('<post>')
for item in lst:
match = re.match('(\d+)(\D+)', item)
if not match:
continue
posts[int(match.group(1))] = match.group(2)
print(posts)
它打印:
{30: 'blueyellow', 2: 'skyearth', 5: 'summerwinter'}
所以posts[30] = 'blueyellow'
.
re
module 在将数字 (\d
) 与非数字 (\D
) 分开时非常有用。
我不知道你希望能够按照什么规则拆分单词。你有可能出现的单词列表吗?
您可以使用 nltk
模块:
import re
import nltk
nltk.download('words')
from nltk.corpus import words
def split(a):
for i in range(len(a)):
if a[:i] in words.words() and a[i:] in words.words():
return [a[:i],a[i:]]
str_of_words = '<post>30blueyellow<post>2skyearth<post>5summerwinter'
post = {i:split(j) for i,j in dict(re.findall(r'post>(\d+)(\w+)',str_of_words)).items()}
post['30']
['blue', 'yellow']
post['5']
['summer', 'winter']
post['2']
['sky', 'earth']