如何从多个段落中找到常用的单词或句子或段落
How to find common words or sentences or paragraphs ,from multiple paragraphs
我有以下示例段落:
para1 = "this is para one. I am cat. I am 10 years old. I like fish"
para2 = "this is para two. I am dog. my age is 12. I can swim"
para3 = "this is para three. I am cat. I am 9 years. I like rat"
para4 = "this is para four. I am rat. my age is secret. I hate cat"
para5 = "this is para five. I am dog. I am 10 years old. I like fish"
需要如下结果:
this is para
I am
I
试过python的SET数据类型,但效果不理想
是否有二进制可执行程序允许我构建命令行来完成我的任务?
您好,您可以执行以下操作
paragraph_lst = ["this is para one. I am cat. I am 10 years old. I like fish",
"this is para two. I am dog. my age is 12. I can swim",
"this is para three. I am cat. I am 9 years. I like rat",
"this is para four. I am rat. my age is secret. I hate cat",
"this is para five. I am dog. I am 10 years old. I like fish"]
word_combinations = set()
def get_combinations(line1, line2, first=0, last=1, prvs_wrd=""):
line_lst = line1.split(" ")
if last > len(line_lst):
return
chk_list = line_lst[first:last]
wrd = " ".join(str(x) for x in chk_list)
if wrd in line2:
prvs_wrd = wrd
get_combinations(line1, line2, first, last + 1, prvs_wrd)
else:
word_combinations.add(prvs_wrd)
get_combinations(line1, line2, last, last + 1, prvs_wrd)
if __name__ == '__main__':
for n, line in enumerate(paragraph_lst):
if n + 1 < len(paragraph_lst):
str1 = paragraph_lst[n]
str2 = paragraph_lst[n + 1]
get_combinations(str1, str2)
print(word_combinations)
因此集合 word_combinations 将给出以下结果
{'I', 'I am', 'is', 'this is para'}
我有以下示例段落:
para1 = "this is para one. I am cat. I am 10 years old. I like fish"
para2 = "this is para two. I am dog. my age is 12. I can swim"
para3 = "this is para three. I am cat. I am 9 years. I like rat"
para4 = "this is para four. I am rat. my age is secret. I hate cat"
para5 = "this is para five. I am dog. I am 10 years old. I like fish"
需要如下结果:
this is para
I am
I
试过python的SET数据类型,但效果不理想
是否有二进制可执行程序允许我构建命令行来完成我的任务?
您好,您可以执行以下操作
paragraph_lst = ["this is para one. I am cat. I am 10 years old. I like fish",
"this is para two. I am dog. my age is 12. I can swim",
"this is para three. I am cat. I am 9 years. I like rat",
"this is para four. I am rat. my age is secret. I hate cat",
"this is para five. I am dog. I am 10 years old. I like fish"]
word_combinations = set()
def get_combinations(line1, line2, first=0, last=1, prvs_wrd=""):
line_lst = line1.split(" ")
if last > len(line_lst):
return
chk_list = line_lst[first:last]
wrd = " ".join(str(x) for x in chk_list)
if wrd in line2:
prvs_wrd = wrd
get_combinations(line1, line2, first, last + 1, prvs_wrd)
else:
word_combinations.add(prvs_wrd)
get_combinations(line1, line2, last, last + 1, prvs_wrd)
if __name__ == '__main__':
for n, line in enumerate(paragraph_lst):
if n + 1 < len(paragraph_lst):
str1 = paragraph_lst[n]
str2 = paragraph_lst[n + 1]
get_combinations(str1, str2)
print(word_combinations)
因此集合 word_combinations 将给出以下结果
{'I', 'I am', 'is', 'this is para'}