从没有标点符号的字符串搜索到主字符串，然后从那里获取没有库的标点符号切片，可能吗？

Question

我有这个作业要做（不允许图书馆），我低估了这个问题：

假设我们有一个字符串列表：str_list = ["my head's", "free", "at last", "into alarm", "in another moment", "neck"]

我们可以肯定的是，每个字符串都包含在 master_string 中，是有序的，没有标点符号。（这一切都归功于我之前所做的控制）

然后我们有字符串：master_string = "'Come, my head's free at last!' said Alice in a tone of delight, which changed into alarm in another moment, when she found that her shoulders were nowhere to be found: all she could see, when she looked down, was an immense length of neck, which seemed to rise like a stalk out of a sea of green leaves that lay far below her."

我在这里必须做的基本上是检查 master_string 中包含的来自 str_list 的至少 k 个字符串序列（在本例中为 k = 2），但是我低估了在 str_list 中每个字符串中有超过 1 个单词的事实，所以 master_string.split() 不会带我去任何地方，因为这意味着要问 if "my head's" == "my" 之类的东西，那是错误的当然。

我正在考虑做一些事情，比如一次连接一个字符串并搜索 master_string.strip(".,:;!?") 但如果我找到相应的序列，我绝对需要直接从 master_string 中获取它们，因为我需要结果变量中的标点符号。这基本上意味着直接从 master_string 中获取切片，但这怎么可能呢？甚至有可能还是我必须改变方法？这让我完全发疯，尤其是因为没有图书馆允许这样做。

如果您想知道这里的预期结果是什么：

["my head's free at last!", "into alarm in another moment,"]（因为两者都遵守来自 str_list 的至少 k 个字符串的条件）并且“neck”将保存在 discard_list 中，因为它不遵守该条件（不能用 .pop() 丢弃它，因为我需要用丢弃的变量做其他事情）

Answer 1

我有两个不同的版本，1 号给你脖子 :(，但 2 号没有那么多，这是 1 号：

master_string = "Come, my head’s free at last!’ said Alice in a tone of delight, which changed into alarm in another moment, when she found that her shoulders were nowhere to be found: all she could see, when she looked down, was an immense length of neck, which seemed to rise like a stalk out of a sea of green leaves that lay far below her."

str_list = ["my head's", "free", "at last", "into alarm", "in another moment", "neck"]

new_str = ''

for word in str_list:
    if word in master_string:
       new_str += word + ' '
            

print(new_str)

这是数字 2：

master_string = "Come, my head’s free at last!’ said Alice in a tone of delight, which changed into alarm in another moment, when she found that her shoulders were nowhere to be found: all she could see, when she looked down, was an immense length of neck, which seemed to rise like a stalk out of a sea of green leaves that lay far below her."

str_list = ["my head's", "free", "at last", "into alarm", "in another moment", "neck"]

new_str = ''

for word in str_list:
    if word in master_string:
        new_word = word.split(' ')
        if len(new_word) == 2:
            new_str += word + ' '
            

print(new_str)

Answer 2

遵循我的解决方案：

尝试扩展所有基于 master_string 和一组有限的标点字符（例如 my head’s -> my head’s free at last!；free -> free at last!).
只保留至少扩展 k 次的子字符串。
删除多余的子字符串（例如 free at last! 已经与 my head’s free at last! 一起出现）。

这是代码：

str_list = ["my head’s", "free", "at last", "into alarm", "in another moment", "neck"]
master_string = "‘Come, my head’s free at last!’ said Alice in a tone of delight, which changed into alarm in another moment, when she found that her shoulders were nowhere to be found: all she could see, when she looked down, was an immense length of neck, which seemed to rise like a stalk out of a sea of green leaves that lay far below her."
punctuation_characters = ".,:;!?"  # list of punctuation characters
k = 1

def extend_string(current_str, successors_num = 0) :
    # check if the next token is a punctuation mark
    for punctuation_mark in punctuation_characters :
        if current_str + punctuation_mark in master_string :
            return extend_string(current_str + punctuation_mark, successors_num)
    
    # check if the next token is a proper successor
    for successor in str_list :
        if current_str + " " + successor in master_string :
            return extend_string(current_str + " " + successor, successors_num+1)
    
    # cannot extend the string anymore
    return current_str, successors_num

extended_strings = []
for s in str_list :
    extended_string, successors_num = extend_string(s)
    if successors_num >= k : extended_strings.append(extended_string)

extended_strings.sort(key=len)  # sorting by ascending length
result_list = []
for es in extended_strings :
    result_list = list(filter(lambda s2 : s2 not in es, result_list))
    result_list.append(es)
print(result_list)      # result: ['my head’s free at last!', 'into alarm in another moment,']

从没有标点符号的字符串搜索到主字符串，然后从那里获取没有库的标点符号切片，可能吗？

From strings without punctuation search into master string and take from there slices with punctuation without libraries, possible?

python

string

slice