在文本中搜索字母模式

search for alphabetic pattern in text

我有一个变量,它的值是像$text = "I take a ball carefully to its place ...";这样的文本。我想搜索此字符串中单词按字母顺序开头的最长子字符串。

例如,上面的文字应该return a ball carefully 部分,因为有 3 个单词以 a... b... c... 模式开头 (a ball carefully)。合法模式不应该从A开始,它也可以像“go hell if jail" 从任意字母开始。

搜索它的最有效方法是什么?

让我们通过按字母顺序收集所有句子来尝试这个,然后从候选中找出最大值。

我在 Java 中为您提供了一个完整的示例。 您可以复制并粘贴在线 java 编译器以查看结果。

import java.util.*;

public class Main
{
    public static void main(String[] args) {
        String input = "I take a ball carefully to its place go hell if jail";
        String[] arr = input.split("\s+");    
        
        ArrayList<String> candidates = new ArrayList();
        
        String temp = "";
        String longest = "";
        char lastWord = 0;
        int maxLen = 0;

        for (String s : arr) {

            if(s.charAt(0) == lastWord+1) {
                temp += " " + s;
            }else{
                candidates.add(temp);
                temp = s;
            }
            lastWord = s.charAt(0);
        }
        
        candidates.add(temp);
        
        
        for(String candid: candidates){
            
            String trim = candid.trim();
            if (trim.isEmpty())
                continue;
                
            int len = trim.split("\s+").length; // separate string around spaces

            if(len > maxLen) {
                longest = trim;
                maxLen = len;
            }
        }
        
        System.out.println("Hello World: " + candidates);
        System.out.println("Hello World: " + longest);
    }
}

解决思路:

在遍历文本时比较每个单词的第一个字符,并在当前字符在 ascii 顺序中紧挨着前一个字符时递增计数器。当遇到不在 ascii 顺序中的下一个字符时,将计数器重置为 1。跟踪您获得最大计数器值的索引。这些索引之间的子字符串是结果。

按字母顺序排列的单词组成的最长子串:

计算每个单词的长度并最大化单词的总长度,如:

def get_longest_substring_with_alphabetically_sorted_words(text):
  words = re.split(r"\s+", text)
  start = 0
  end = 0
  start_max = start
  end_max = end
  ch = words[0][0]
  c = len(words[0])
  mx = 0
  for i in range(1,len(words)):
    if ord(ch) + 1 == ord(words[i][0]):
      c += len(words[i])
      if c > mx:
        mx = c
        end = i
        start_max = start
        end_max = end
    else:
      start = i
      end = i
      c = len(words[i])
      if c > mx:
        mx = c
        start_max = start
        end_max = end
    ch = words[i][0]
  return (' '.join(words[start_max:end_max+1]))

输入:

1. text = "a bee can do ever for supercalifragilisticexpialidocious verb"
2. text = "a bee can do supercalifragilisticexpialidocious tea"

输出:

1. supercalifragilisticexpialidocious
2. supercalifragilisticexpialidocious tea

按字母顺序排列的最长单词列表:

将每个单词计为 1 个单元并最大化单词列表的长度,如:

import re
def get_longest_alphabetical_words(text):
    words = re.split(r"\s+", text)
    c = 1
    start = 0
    end = 0
    start_max = start
    end_max = end
    ch = words[0][0]
    mx = 0
    for i in range(1, len(words)):
        if ord(ch) + 1 == ord(words[i][0]):
            c += 1
            if c > mx:
                mx = c
                end = i
                start_max = start
                end_max = end
        else:
            start = i
            end = i
            c = 1
        ch = words[i][0]

    return ' '.join(words[start_max:end_max + 1])


text = 'I take a ball carefully to its place ... go hell if jail'
print(get_longest_alphabetical_words(text))

输入:

text = "I take a ball carefully to its place ... go hell if jail"

输出:

go hell if jail