在文本中搜索字母模式
search for alphabetic pattern in text
我有一个变量,它的值是像$text = "I take a ball carefully to its place ...";
这样的文本。我想搜索此字符串中单词按字母顺序开头的最长子字符串。
例如,上面的文字应该return a ball carefully
部分,因为有 3 个单词以 a... b... c... 模式开头 (a ball carefully)。合法模式不应该从A开始,它也可以像“go hell if jail" 从任意字母开始。
搜索它的最有效方法是什么?
让我们通过按字母顺序收集所有句子来尝试这个,然后从候选中找出最大值。
我在 Java 中为您提供了一个完整的示例。
您可以复制并粘贴在线 java 编译器以查看结果。
import java.util.*;
public class Main
{
public static void main(String[] args) {
String input = "I take a ball carefully to its place go hell if jail";
String[] arr = input.split("\s+");
ArrayList<String> candidates = new ArrayList();
String temp = "";
String longest = "";
char lastWord = 0;
int maxLen = 0;
for (String s : arr) {
if(s.charAt(0) == lastWord+1) {
temp += " " + s;
}else{
candidates.add(temp);
temp = s;
}
lastWord = s.charAt(0);
}
candidates.add(temp);
for(String candid: candidates){
String trim = candid.trim();
if (trim.isEmpty())
continue;
int len = trim.split("\s+").length; // separate string around spaces
if(len > maxLen) {
longest = trim;
maxLen = len;
}
}
System.out.println("Hello World: " + candidates);
System.out.println("Hello World: " + longest);
}
}
解决思路:
在遍历文本时比较每个单词的第一个字符,并在当前字符在 ascii 顺序中紧挨着前一个字符时递增计数器。当遇到不在 ascii 顺序中的下一个字符时,将计数器重置为 1。跟踪您获得最大计数器值的索引。这些索引之间的子字符串是结果。
按字母顺序排列的单词组成的最长子串:
计算每个单词的长度并最大化单词的总长度,如:
def get_longest_substring_with_alphabetically_sorted_words(text):
words = re.split(r"\s+", text)
start = 0
end = 0
start_max = start
end_max = end
ch = words[0][0]
c = len(words[0])
mx = 0
for i in range(1,len(words)):
if ord(ch) + 1 == ord(words[i][0]):
c += len(words[i])
if c > mx:
mx = c
end = i
start_max = start
end_max = end
else:
start = i
end = i
c = len(words[i])
if c > mx:
mx = c
start_max = start
end_max = end
ch = words[i][0]
return (' '.join(words[start_max:end_max+1]))
输入:
1. text = "a bee can do ever for supercalifragilisticexpialidocious verb"
2. text = "a bee can do supercalifragilisticexpialidocious tea"
输出:
1. supercalifragilisticexpialidocious
2. supercalifragilisticexpialidocious tea
按字母顺序排列的最长单词列表:
将每个单词计为 1 个单元并最大化单词列表的长度,如:
import re
def get_longest_alphabetical_words(text):
words = re.split(r"\s+", text)
c = 1
start = 0
end = 0
start_max = start
end_max = end
ch = words[0][0]
mx = 0
for i in range(1, len(words)):
if ord(ch) + 1 == ord(words[i][0]):
c += 1
if c > mx:
mx = c
end = i
start_max = start
end_max = end
else:
start = i
end = i
c = 1
ch = words[i][0]
return ' '.join(words[start_max:end_max + 1])
text = 'I take a ball carefully to its place ... go hell if jail'
print(get_longest_alphabetical_words(text))
输入:
text = "I take a ball carefully to its place ... go hell if jail"
输出:
go hell if jail
我有一个变量,它的值是像$text = "I take a ball carefully to its place ...";
这样的文本。我想搜索此字符串中单词按字母顺序开头的最长子字符串。
例如,上面的文字应该return a ball carefully
部分,因为有 3 个单词以 a... b... c... 模式开头 (a ball carefully)。合法模式不应该从A开始,它也可以像“go hell if jail" 从任意字母开始。
搜索它的最有效方法是什么?
让我们通过按字母顺序收集所有句子来尝试这个,然后从候选中找出最大值。
我在 Java 中为您提供了一个完整的示例。 您可以复制并粘贴在线 java 编译器以查看结果。
import java.util.*;
public class Main
{
public static void main(String[] args) {
String input = "I take a ball carefully to its place go hell if jail";
String[] arr = input.split("\s+");
ArrayList<String> candidates = new ArrayList();
String temp = "";
String longest = "";
char lastWord = 0;
int maxLen = 0;
for (String s : arr) {
if(s.charAt(0) == lastWord+1) {
temp += " " + s;
}else{
candidates.add(temp);
temp = s;
}
lastWord = s.charAt(0);
}
candidates.add(temp);
for(String candid: candidates){
String trim = candid.trim();
if (trim.isEmpty())
continue;
int len = trim.split("\s+").length; // separate string around spaces
if(len > maxLen) {
longest = trim;
maxLen = len;
}
}
System.out.println("Hello World: " + candidates);
System.out.println("Hello World: " + longest);
}
}
解决思路:
在遍历文本时比较每个单词的第一个字符,并在当前字符在 ascii 顺序中紧挨着前一个字符时递增计数器。当遇到不在 ascii 顺序中的下一个字符时,将计数器重置为 1。跟踪您获得最大计数器值的索引。这些索引之间的子字符串是结果。
按字母顺序排列的单词组成的最长子串:
计算每个单词的长度并最大化单词的总长度,如:
def get_longest_substring_with_alphabetically_sorted_words(text):
words = re.split(r"\s+", text)
start = 0
end = 0
start_max = start
end_max = end
ch = words[0][0]
c = len(words[0])
mx = 0
for i in range(1,len(words)):
if ord(ch) + 1 == ord(words[i][0]):
c += len(words[i])
if c > mx:
mx = c
end = i
start_max = start
end_max = end
else:
start = i
end = i
c = len(words[i])
if c > mx:
mx = c
start_max = start
end_max = end
ch = words[i][0]
return (' '.join(words[start_max:end_max+1]))
输入:
1. text = "a bee can do ever for supercalifragilisticexpialidocious verb"
2. text = "a bee can do supercalifragilisticexpialidocious tea"
输出:
1. supercalifragilisticexpialidocious
2. supercalifragilisticexpialidocious tea
按字母顺序排列的最长单词列表:
将每个单词计为 1 个单元并最大化单词列表的长度,如:
import re
def get_longest_alphabetical_words(text):
words = re.split(r"\s+", text)
c = 1
start = 0
end = 0
start_max = start
end_max = end
ch = words[0][0]
mx = 0
for i in range(1, len(words)):
if ord(ch) + 1 == ord(words[i][0]):
c += 1
if c > mx:
mx = c
end = i
start_max = start
end_max = end
else:
start = i
end = i
c = 1
ch = words[i][0]
return ' '.join(words[start_max:end_max + 1])
text = 'I take a ball carefully to its place ... go hell if jail'
print(get_longest_alphabetical_words(text))
输入:
text = "I take a ball carefully to its place ... go hell if jail"
输出:
go hell if jail