Java 中的子字符串操作 - 查找由其他单词构成的最长单词

Substring manipulation in Java - Find the longest word made from the other words

我需要从一个文件中读取内容并找到最长的单词,该单词可以由文件中存在的其他单词组成。文件中的单词是 space 分隔的。例如:

来自文件的输入:

This is example an anexample Thisisanexample Thisistheexample

输出:

Thisisanexample

注意:形成的最长单词是Thisisanexample而不是Thisistheexample,因为单词the没有作为单独的单词包含在文件。

这可以通过使用简单的数组来实现吗?我做了以下事情:

try{
        File file = new File(args[0]);  //command line argument for file path
        br = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
        String line = null;
        //array for each word
        String[] words = new String[] {}; 
        while ((line = br.readLine()) != null){
            words = line.split("\s+"); //splitting the string with spaces
        }
        // array to store length of each word
        int[] wordLength = new int[words.length]; 
        for(int i = 0; i < words.length; i++){
            wordLength[i] = words[i].length();
        }

        int currLength = 0; //store length of current word
        int maxLength = 0;  //store length of max word
        String maxWord = null;

        //checking each word with others at O(n*n) complexity
        for (int i = 0; i < words.length; i++){
            currLength = 0;
            for (int j = 0; j < words.length && j != i; j++){
                if (words[i].contains(words[j])){
                    currLength += wordLength[j];
                }
            }
            System.out.println(currLength);
            if(currLength > maxLength){
                maxLength = currLength;
                maxWord = words[i];
            }
        }
        System.out.println(maxWord);
    }

但是如果有一个子串与一个子串,这就不起作用了。对于以下输入,它将给出错误的输出:

This is example an anexample Thisisanexample Thisisanexample2

输出应该是 Thisisanexample 但它给出了 Thisisanexample2.

在其他 Stack Overflow 线程的帮助下,我仅通过使用数组就设法做到了这一点。

解决方法如下:

import java.io.*;
import java.util.*;

public class LongestWord implements Comparator<String>{
    //compare function to be used for sorting the array according to word length
    public int compare(String s1, String s2) {
        if (s1.length() < s2.length())
           return 1;
    else if (s1.length() > s2.length())
        return -1;
    else
        return 0;
}

public static void main(String[] args){
    BufferedReader br = null;
    try{
        File file = new File(args[0]);
        br = new BufferedReader(new InputStreamReader(new     FileInputStream(file)));
        String line = null;
        //array for each word
        String[] words = new String[] {}; 
        while ((line = br.readLine()) != null){
            words = line.split("\s+"); //splitting the string with spaces
        }

        //sort the array according to length of words in descending order
        Arrays.sort(words, new LongestWord());

        /* start with the longest word in the array and check if the other words are its substring.
         * If substring, then remove that part from the superstring.
         * Finally, if the superstring length is 0, then it is the longest word that can be formed.*/
        for (String superString: words) {
            String current = new String(superString); // to store a copy of the current superstring as we will remove parts of the actual superstring
            for (String subString: words) {
                if (!subString.equals(current) && superString.contains(subString)) { // superstring contains substring
                    superString = superString.replace(subString, "");  // remove the substring part from the superstring
                }
            }

            if (superString.length() == 0){
                System.out.println(current);
                break; // since the array is sorted, the first word that returns length 0  is the longest word formed
            }
        }
    }
    catch(FileNotFoundException fex){
        System.out.println("File not found");
        return;
    }
    catch(IOException e){
        e.printStackTrace();
    }
    finally{
        try {
            if (br != null){
                br.close();
            }
        } catch (IOException ex) {
                ex.printStackTrace();
            }
        }
    }
}

只需几行代码,您就可以使用正则表达式找到候选 "combination" 个单词,然后用简单的逻辑找到最长的匹配:

String longest = "";
Matcher m = Pattern.compile("(?i)\b(this|is|an|example)+\b").matcher(input);
while (m.find())
     if ( m.group().length() > longest.length())
        longest = m.group();

除了从文件中读取代码并将字符串分配给变量 input 之外,这就是您需要的所有代码。