从没有标点符号的字符串搜索到主字符串,然后从那里获取没有库的标点符号切片,可能吗?

From strings without punctuation search into master string and take from there slices with punctuation without libraries, possible?

我有这个作业要做(不允许图书馆),我低估了这个问题:

假设我们有一个字符串列表:str_list = ["my head's", "free", "at last", "into alarm", "in another moment", "neck"]

我们可以肯定的是,每个字符串都包含在 master_string 中,是有序的,没有标点符号。 (这一切都归功于我之前所做的控制)

然后我们有字符串:master_string = "'Come, my head's free at last!' said Alice in a tone of delight, which changed into alarm in another moment, when she found that her shoulders were nowhere to be found: all she could see, when she looked down, was an immense length of neck, which seemed to rise like a stalk out of a sea of green leaves that lay far below her."

我在这里必须做的基本上是检查 master_string 中包含的来自 str_list 的至少 k 个字符串序列(在本例中为 k = 2),但是我低估了在 str_list 中每个字符串中有超过 1 个单词的事实,所以 master_string.split() 不会带我去任何地方,因为这意味着要问 if "my head's" == "my" 之类的东西,那是错误的当然。

我正在考虑做一些事情,比如一次连接一个字符串并搜索 master_string.strip(".,:;!?") 但如果我找到相应的序列,我绝对需要直接从 master_string 中获取它们,因为我需要结果变量中的标点符号。这基本上意味着直接从 master_string 中获取切片,但这怎么可能呢?甚至有可能还是我必须改变方法?这让我完全发疯,尤其是因为没有图书馆允许这样做。

如果您想知道这里的预期结果是什么:

["my head's free at last!", "into alarm in another moment,"](因为两者都遵守来自 str_list 的至少 k 个字符串的条件)并且“neck”将保存在 discard_list 中,因为它不遵守该条件(不能用 .pop() 丢弃它,因为我需要用丢弃的变量做其他事情)

我有两个不同的版本,1 号给你脖子 :(,但 2 号没有那么多,这是 1 号:

master_string = "Come, my head’s free at last!’ said Alice in a tone of delight, which changed into alarm in another moment, when she found that her shoulders were nowhere to be found: all she could see, when she looked down, was an immense length of neck, which seemed to rise like a stalk out of a sea of green leaves that lay far below her."

str_list = ["my head's", "free", "at last", "into alarm", "in another moment", "neck"]

new_str = ''

for word in str_list:
    if word in master_string:
       new_str += word + ' '
            

print(new_str)

这是数字 2:

master_string = "Come, my head’s free at last!’ said Alice in a tone of delight, which changed into alarm in another moment, when she found that her shoulders were nowhere to be found: all she could see, when she looked down, was an immense length of neck, which seemed to rise like a stalk out of a sea of green leaves that lay far below her."

str_list = ["my head's", "free", "at last", "into alarm", "in another moment", "neck"]

new_str = ''

for word in str_list:
    if word in master_string:
        new_word = word.split(' ')
        if len(new_word) == 2:
            new_str += word + ' '
            

print(new_str)

遵循我的解决方案:

  1. 尝试扩展所有基于 master_string 和一组有限的标点字符(例如 my head’s -> my head’s free at last!free -> free at last!).
  2. 只保留至少扩展 k 次的子字符串。
  3. 删除多余的子字符串(例如 free at last! 已经与 my head’s free at last! 一起出现)。

这是代码:

str_list = ["my head’s", "free", "at last", "into alarm", "in another moment", "neck"]
master_string = "‘Come, my head’s free at last!’ said Alice in a tone of delight, which changed into alarm in another moment, when she found that her shoulders were nowhere to be found: all she could see, when she looked down, was an immense length of neck, which seemed to rise like a stalk out of a sea of green leaves that lay far below her."
punctuation_characters = ".,:;!?"  # list of punctuation characters
k = 1

def extend_string(current_str, successors_num = 0) :
    # check if the next token is a punctuation mark
    for punctuation_mark in punctuation_characters :
        if current_str + punctuation_mark in master_string :
            return extend_string(current_str + punctuation_mark, successors_num)
    
    # check if the next token is a proper successor
    for successor in str_list :
        if current_str + " " + successor in master_string :
            return extend_string(current_str + " " + successor, successors_num+1)
    
    # cannot extend the string anymore
    return current_str, successors_num

extended_strings = []
for s in str_list :
    extended_string, successors_num = extend_string(s)
    if successors_num >= k : extended_strings.append(extended_string)

extended_strings.sort(key=len)  # sorting by ascending length
result_list = []
for es in extended_strings :
    result_list = list(filter(lambda s2 : s2 not in es, result_list))
    result_list.append(es)
print(result_list)      # result: ['my head’s free at last!', 'into alarm in another moment,']