流在排序频繁出现的单词中的替代方案

Question

所以，我有一个方法将字符串列表作为参数并读取它。然后按频率对它们进行排序，如果单词具有相同的频率，它们将按字母顺序打印。（实际上也有俄语单词，它们总是在英语单词下面）。

这是一个好的输出示例：

лицами-18
Apex-15
azet-15
xder-15
анатолю-15
андреевич-15
батальона-15
hello-13
zello-13
полноте-13

这是我的代码：

public class Words {

public String countWords(List<String> lines) {

    StringBuilder input = new StringBuilder();
    StringBuilder answer = new StringBuilder();

    for (String line : lines){
        if(line.length() > 3){
            if(line.substring(line.length() - 1).matches("[.?!,]+")){
                input.append(line.substring(0,line.length()-1)).append(" ");
            }else{
                input.append(line).append(" ");
            }
        }
    }

    String[] strings = input.toString().split("\s");

    List<String> list = new ArrayList<>(Arrays.asList(strings));

    Map<String, Integer> unsortMap = new HashMap<>();
    while (list.size() != 0){
        String word = list.get(0);
        int freq = Collections.frequency(list, word);
        if (word.length() >= 4 && freq >= 10){
            unsortMap.put(word.toLowerCase(), freq);
        }

        list.removeAll(Collections.singleton(word));
    }
    //The Stream logic is here
    List<String> sortedEntries = unsortMap.entrySet().stream()
            .sorted(Comparator.comparingLong(Map.Entry<String, Integer>::getValue)
                    .reversed()
                    .thenComparing(Map.Entry::getKey)
            )
            .map(it -> it.getKey() + " - " + it.getValue())
            .collect(Collectors.toList());
    
    //Logic ends here

    for (int i = 0; i < sortedEntries.size(); i++) {
        if(i<sortedEntries.size()-1) {
            answer.append(sortedEntries.get(i)).append("\n");
        }
        else{
            answer.append(sortedEntries.get(i));
        }
    }

    return answer.toString();

 }
}

我的问题：目前代码运行良好，并给出了成功的结果，但是如您所见，我正在使用流对字符串进行排序。但是，我只是感兴趣是否有其他解决方案可以在不使用流的情况下编写我的代码。更准确地说，是否有任何其他方法可以在不使用流的情况下按频率然后按字母顺序（如果它们具有相同的频率）对字符串进行排序。

Answer 1

您可以在流中执行的任何操作都可以在常规 Java 中执行。但是使用流通常会使代码更短、更简单、更易于阅读！

顺便说一下，您的代码的前半部分可以简单地替换为：

Map < String, AtomicInteger > map = new HashMap <>();
for ( String word : words ) {
    map.putIfAbsent( word , new AtomicInteger( 0 ) );
    map.get( word ).incrementAndGet();
}

代码的后半部分是通过先按值排序，然后按键排序来报告地图。

问题中讨论了该挑战，Sorting a HashMap based on Value then Key? and Sort a Map<Key, Value> by values. There are some clever solutions among those Answers, such as this one by Sean。

但我宁愿保持简单。我会将我们的单词和单词计数映射转换为我们自己自定义的对象 class，每个对象将单词和单词计数作为字段保存。

Java 16+ 带来了 records 特性，使得自定义 class 定义变得更加容易。记录是编写 class 的一种更简洁的方式，其主要目的是透明且不可变地传递数据。编译器隐式创建构造函数、getter、equals & hashCode 和 toString.

record WordAndCount (String word , int count ) {}

在 Java 16 之前，使用常规 class 代替 record。这是相当于该记录一行的 33 行源代码。

final class WordAndCount {
    private final String word;
    private final int count;

    WordAndCount ( String word , int count ) {
        this.word = word;
        this.count = count;
    }

    public String word () { return word; }

    public int count () { return count; }

    @Override
    public boolean equals ( Object obj ) {
        if ( obj == this ) return true;
        if ( obj == null || obj.getClass() != this.getClass() ) return false;
        var that = ( WordAndCount ) obj;
        return Objects.equals( this.word , that.word ) && this.count == that.count;
    }

    @Override
    public int hashCode () {
        return Objects.hash( word , count );
    }

    @Override
    public String toString () {
        return "WordAndCount[" + "word=" + word + ", " + "count=" + count + ']';
    }
}

我们创建一个该记录类型的对象数组，并填充。

List<WordAndCount> wordAndCounts = new ArrayList <>(map.size()) ;
for ( String word : map.keySet() ) {
    wordAndCounts.add( new WordAndCount( word, map.get( word ).get() ) );
}

现在排序。 Comparator 接口有一些方便的工厂方法，我们可以在其中传递方法引用。

wordAndCounts.sort(
        Comparator
                .comparingInt( WordAndCount ::count )
                .reversed()
                .thenComparing( WordAndCount ::word )
);

让我们把所有的代码放在一起。

package work.basil.text;

import java.util.*;
import java.util.concurrent.atomic.AtomicInteger;

public class EngRus {
    public static void main ( String[] args ) {
        // Populate input data.
        List < String > words = EngRus.generateText(); // Recreate the original data seen in the Question.
        System.out.println( "words = " + words );

        // Count words in the input list.
        Map < String, AtomicInteger > map = new HashMap <>();
        for ( String word : words ) {
            map.putIfAbsent( word , new AtomicInteger( 0 ) );
            map.get( word ).incrementAndGet();
        }
        System.out.println( "map = " + map );

        // Report on word count, sorting first by word-count numerically and then by word alphabetically.
        record WordAndCount( String word , int count ) { }
        List < WordAndCount > wordAndCounts = new ArrayList <>( map.size() );
        for ( String word : map.keySet() ) {
            wordAndCounts.add( new WordAndCount( word , map.get( word ).get() ) );
        }
        wordAndCounts.sort( Comparator.comparingInt( WordAndCount :: count ).reversed().thenComparing( WordAndCount :: word ) );
        System.out.println( "wordAndCounts = " + wordAndCounts );
    }

    public static List < String > generateText () {
        String input = """
                лицами-18
                Apex-15
                azet-15
                xder-15
                анатолю-15
                андреевич-15
                батальона-15
                hello-13
                zello-13
                полноте-13
                """;

        List < String > words = new ArrayList <>();
        input.lines().forEach( line -> {
            String[] parts = line.split( "-" );
            for ( int i = 0 ; i < Integer.parseInt( parts[ 1 ] ) ; i++ ) {
                words.add( parts[ 0 ] );
            }
        } );
        Collections.shuffle( words );
        return words;
    }
}

当运行:

words = [андреевич, hello, xder, батальона, лицами, полноте, анатолю, лицами, полноте, полноте, анатолю, анатолю, zello, hello, лицами, xder, батальона, Apex, xder, андреевич, анатолю, hello, xder, Apex, xder, андреевич, лицами, zello, полноте, лицами, Apex, батальона, zello, полноте, xder, hello, azet, батальона, zello, hello, полноте, Apex, полноте, полноте, azet, андреевич, полноте, Apex, анатолю, hello, azet, лицами, анатолю, zello, анатолю, Apex, zello, андреевич, лицами, xder, hello, полноте, zello, Apex, батальона, лицами, hello, azet, Apex, анатолю, анатолю, zello, полноте, анатолю, Apex, батальона, андреевич, лицами, андреевич, azet, azet, лицами, лицами, zello, azet, анатолю, xder, батальона, полноте, лицами, hello, лицами, xder, xder, лицами, zello, андреевич, батальона, лицами, андреевич, azet, полноте, hello, андреевич, лицами, hello, Apex, батальона, hello, azet, лицами, zello, батальона, анатолю, Apex, azet, xder, андреевич, андреевич, батальона, анатолю, батальона, Apex, xder, azet, azet, xder, azet, анатолю, Apex, батальона, Apex, Apex, лицами, батальона, xder, батальона, hello, андреевич, андреевич, azet, zello, андреевич, xder, azet, анатолю, zello]

map = {андреевич=15, xder=15, zello=13, батальона=15, azet=15, лицами=18, анатолю=15, hello=13, Apex=15, полноте=13}

wordAndCounts = [WordAndCount[word=лицами, count=18], WordAndCount[word=Apex, count=15], WordAndCount[word=azet, count=15], WordAndCount[word=xder, count=15], WordAndCount[word=анатолю, count=15], WordAndCount[word=андреевич, count=15], WordAndCount[word=батальона, count=15], WordAndCount[word=hello, count=13], WordAndCount[word=zello, count=13], WordAndCount[word=полноте, count=13]]

流在排序频繁出现的单词中的替代方案

Alternative of streams in sorting frequently occurred words

java

sorting

collections

dictionary

stream